SlideShare a Scribd company logo
Data Science Techniques And Intelligent
Applications 1st Edition Pallavi Vijay Chavan
download
https://ebookbell.com/product/data-science-techniques-and-
intelligent-applications-1st-edition-pallavi-vijay-
chavan-43892950
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Data Science Techniques And Intelligent Applications Pallavi Vijay
Chavan
https://ebookbell.com/product/data-science-techniques-and-intelligent-
applications-pallavi-vijay-chavan-43877416
Convergence Of Big Data Technologies And Computational Intelligent
Techniques Govind P Gupta
https://ebookbell.com/product/convergence-of-big-data-technologies-
and-computational-intelligent-techniques-govind-p-gupta-47573698
Intelligent Techniques For Warehousing And Mining Sensor Network Data
Alfredo Cuzzocrea
https://ebookbell.com/product/intelligent-techniques-for-warehousing-
and-mining-sensor-network-data-alfredo-cuzzocrea-1339792
Modern Artificial Intelligence And Data Science Tools Techniques And
Systems 1st Edition Abdellah Idrissi
https://ebookbell.com/product/modern-artificial-intelligence-and-data-
science-tools-techniques-and-systems-1st-edition-abdellah-
idrissi-51716116
Intelligence Science And Big Data Engineering Big Data And Machine
Learning Techniques 5th International Conference Iscide 2015 Suzhou
China June 1416 2015 Revised Selected Papers Part Ii 1st Edition
Xiaofei He
https://ebookbell.com/product/intelligence-science-and-big-data-
engineering-big-data-and-machine-learning-techniques-5th-
international-conference-iscide-2015-suzhou-china-
june-1416-2015-revised-selected-papers-part-ii-1st-edition-xiaofei-
he-5236382
Computational Intelligence For Missing Data Imputation Estimation And
Management Knowledge Optimization Techniques Tshilidzi Marwala
https://ebookbell.com/product/computational-intelligence-for-missing-
data-imputation-estimation-and-management-knowledge-optimization-
techniques-tshilidzi-marwala-1185414
Geospatial Data Science Techniques And Applications 1st Edition Hassan
A Karimi
https://ebookbell.com/product/geospatial-data-science-techniques-and-
applications-1st-edition-hassan-a-karimi-6837948
Java For Data Science Examine The Techniques And Java Tools Supporting
The Growing Field Of Data Science Reese
https://ebookbell.com/product/java-for-data-science-examine-the-
techniques-and-java-tools-supporting-the-growing-field-of-data-
science-reese-20632914
Principles Of Data Science Learn The Techniques And Math You Need To
Start Making Sense Of Your Data 1st Edition Sinan Ozdemir
https://ebookbell.com/product/principles-of-data-science-learn-the-
techniques-and-math-you-need-to-start-making-sense-of-your-data-1st-
edition-sinan-ozdemir-5635464
Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan
Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan
Data Science
This book covers the topic of data science in a comprehensive manner and synthesizes
both fundamental and advanced topics of a research area that has now reached its matu-
rity. The book starts with the basic concepts of data science. It highlights the types of data
and their use and importance, followed by a discussion on a wide range of applications of
data science and widely used techniques in data science.
Key Features
• Provides an internationally respected collection of scientific research methods,
technologies and applications in the area of data science.
• Presents predictive outcomes by applying data science techniques to real-life
applications.
• Provides readers with the tools, techniques and cases required to excel with mod-
ern artificial intelligence methods.
• Gives the reader a variety of intelligent applications that can be designed using
data science and its allied fields.
The book is aimed primarily at advanced undergraduates and graduates studying machine
learning and data science. Researchers and professionals will also find this book useful.
Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan
Data Science
Techniques and Intelligent Applications
Edited by
Pallavi Vijay Chavan
Parikshit N Mahalle
Ramchandra Mangrulkar
Idongesit Williams
First edition published 2023
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2023 selection and editorial matter, [Pallavi Vijay Chavan, Parikshit N Mahalle, Ramchandra Sharad Mangrulkar
and Idongesit Williams]; individual chapters, the contributors
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been
acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including
photocopying, microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are
not available on CCC please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for
identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Names: Chavan, Pallavi, editor. | Mahalle, Parikshit N, editor. | Mangrulkar, Ramchandra, editor. |
Williams, Idongesit, editor.
Title: Data science : techniques and intelligent applications / edited by
Pallavi Vijay Chavan, Parikshit N Mahalle, Ramchandra Mangrulkar, Idongesit Williams.
Description: First edition. | Boca Raton, FL : Chapman & Hall/CRC Press, 2023. |
Includes bibliographical references and index. |
Summary: “The proposed book covers the topic of data science in a very comprehensive manner and synthesizes both
fundamental and advanced topics of a research area that has now reached maturity. The book starts from the basic
concepts of data science; it highlights the types of data, its use and its importance, followed by discussion on a wide
range of applications of data science and widely used techniques in data science.
Key features: provides an internationally respected collection of scientific research methods, technologies and appli-
cations in the area of data science, presents predictive outcomes by applying data science techniques on real life appli-
cations, provides readers with the tools, techniques and cases required to excel with modern artificial intelligence
methods, and gives the reader variety of intelligent applications that can be designed using data science and its allied
fields. The book is aimed primarily at advanced undergraduates and graduates studying machine learning and data
science. Researchers and professionals will also find this book useful”— Provided by publisher.
Identifiers: LCCN 2022002189 (print) | LCCN 2022002190 (ebook) |
ISBN 9781032254494 (hbk) | ISBN 9781032254517 (pbk) | ISBN 9781003283249 (ebk)
Subjects: LCSH: Big data. | Electronic data processing. | Information technology. | Information science.
Classification: LCC QA76.9.B45 D394 2023 (print) | LCC QA76.9.B45 (ebook) | DDC 005.7—dc23/eng/20220416
LC record available at https://lccn.loc.gov/2022002189
LC ebook record available at https://lccn.loc.gov/2022002190
ISBN: 978-1-032-25449-4 (hbk)
ISBN: 978-1-032-25451-7 (pbk)
ISBN: 978-1-003-28324-9 (ebk)
DOI: 10.1201/9781003283249
Typeset in Palatino
by codeMantra
­ ­ ­ ­ ­
­ ­ ­ ­ ­
­ ­ ­ ­ ­
­
v
Contents
Preface............................................................................................................................................ vii
Editors..............................................................................................................................................ix
Contributors....................................................................................................................................xi
1. Instigation and Development of Data Science .................................................................1
Priyali Sakhare, Pallavi Vijay Chavan, Pournima Kulkarni, and Ashwini Sarode
2. Role of Statistical Methods in Data Science ...................................................................21
Deepa Mane and Sachin Shelke
3. Real-World Applications of Data Science........................................................................33
Baisa L. Gunjal
4. HDWR_SmartNet: A Smart Handwritten Devanagari Word Recognition
System Using Deep ResNet-Based on Scan Profile Method .......................................57
Shalaka Prasad Deore
5. Safe Social Distance Monitoring and Face Mask Detection for Controlling
COVID-19 Spread.................................................................................................................79
Nikhil Ingale, Onkar Rane, Piyush Kadam, and Vivek Kumar Singh
6. Real-Time Virtual Fitness Tracker and Exercise Posture Correction ........................91
Tejas Kachare, Manisha Sinha, Siddharth Kakade, Aryan Kakade, and Siddharth Nigade
7. Role of Data Science in Revolutionizing Healthcare.................................................. 103
Yashsingh Manral, Siddhesh Unhavane, and Jyoti Kundale
8. Application of Artificial Intelligence Techniques in the Early-Stage
Detection of Chronic Kidney Disease............................................................................ 119
Anindita A. Khade and Amarsinh V. Vidhate
9. Multi-Optimal Deep Learning Technique for Detection and Classification of
Breast Cancer....................................................................................................................... 131
Pravin Malve and Vijay S. Gulhane
10. Realizing Mother’s Features Influential on Childbirth Experience, towards
Creation of a Dataset.......................................................................................................... 143
Himani Deshpande and Leena Ragha
11. BERT- and FastText-Based Research Paper Recommender System.......................... 169
Nemil Shah, Yash Goda, Naitik Rathod, Vatsal Khandor, Pankaj Kulkarni, and
Ramchandra Mangrulkar
­
vi Contents
12. Analysis and Prediction of Crime Rate against Women Using Classification
and Regression Trees..........................................................................................................185
P. Tamilarasi and R. Uma Rani
13. Data Analysis for Technical Business Incubation Performance Improvement...... 195
Swati Shekapure and Nitin Shekapure
14. Satellite ­
Imagery-Based Wildfire Detection Using Deep Learning......................... 213
Anant Kaulage, Sagar Rane, and Sunil Dhore
15.	­
Low-Resource Language Document Summarization: A Challenge.........................221
Pranjali Deshpande and Sunita Jahirabadkar
16. Eclectic Analysis of Classifiers for Fake News Detection...........................................227
Vatsal Khandor, Yashvi Shah, Krutik Shah, Nitin Goje, and Ramchandra Mangrulkar
17. Data Science and Machine Learning Applications for Mental Health....................249
Dhruvi Khankhoje, Pruthav Jhaveri, and Narendra M. Shekokar
18. Analysis of Ancient and Modern Meditation Techniques on Human Mind
and Body and Their Effectiveness in ­
COVID-19 Pandemic....................................... 267
Abhijeet Kushwah, Tirth Pandya, Prarthana Dhok, Prateek Koul, Pallavi Vijay Chavan,
Shreeraj Vijayan, and Vishal Shrivastava
Index..............................................................................................................................................299
vii
Preface
We are in the ­
data-driven age, and the data science field has gained attention of people
not only from the technical field but also from all the areas of society. There has been
tremendous growth in data every day, and we are equipping ourselves with the best set
of tools and technologies to handle those data. Data science has become a core capabil-
ity everywhere, more specifically in the industrial sector. Many ­
e-commerce businesses
are data driven. Every day, these ­
e-commerce businesses collect a huge amount of data,
analyze the data and decide on some policy to improve the business. The fourth indus-
trial revolution we are observing all over the globe is based on data science and its allied
fields. Machine learning, deep learning and artificial intelligence are the key elements in
data science. Data science with its scientific methods gives conclusions and analysis, while
machine learning and deep learning algorithms help to have predictive models. There
are several algorithms available with machine learning such as supervised learning, rein-
forcement learning and unsupervised learning. Classification is the most popular category
of machine learning algorithms. The researchers and practitioners use machine learning
and deep learning algorithms as per their requirements and suitability of the algorithms
to their problem statement. Deep learning techniques also play a crucial role in the design
of intelligent applications. Deep learning algorithms are a family of artificial neural net-
works. These networks learn themselves and get trained for a given set of data. Validation
testing in deep learning exhibits accuracy in learning. Artificial intelligence also plays a
significant role in the design of intelligent applications with data science. Artificial intel-
ligence helps in designing ­
rule-based systems. Such ­
rule-based systems include facts and
rules of inferences. On the other side, there are the technologies such as soft computing,
fuzzy logic and genetic algorithm. These technologies have a significant contribution to
the development of intelligent applications. Today, intelligent applications are used almost
in all the sectors such as healthcare, education, agriculture, finance, games and sports, and
entertainment. Intelligent applications in healthcare include diabetic retinopathy, diabetes
detection, expert systems for medical diagnosis and many more. In the agriculture field,
the popular intelligent applications includes the prediction of rain, weather forecasting,
crop prediction, and soil quality determination and prediction. In finance, applications
like stock price prediction and gold rate prediction are most popular. This book aims to
provide a platform for researchers to promote intelligent applications and their design for
the betterment of the technology and the society in turn.
This volume comprises 18 chapters. The technical contents begin with the data science
fundamentals; the book further includes the chapters based on tools, methods and the
techniques used in data science. The chapters focus on the fundamentals of data science
and the applications from domains such as agriculture, finance, healthcare, real estate and
crime. ­
Chapter 1 states the basic building blocks of data science with different tools and
techniques used in data analysis. ­
Chapter 2 discusses the role of statistical methods in data
science. These methods are important in analytical processing. ­
Chapter 3 describes broad
application areas of data science in the real world. Some of them are agriculture, healthcare,
education, entertainment, etc. ­
Chapter 4 discusses a smart Handwritten Devanagari Word
Recognition system using Deep ResNet based on the Scan Profile method. This chapter
exhibits the analysis of different classifiers for ­
speech-based emotion recognition systems
using linear and nonlinear features. ­
Chapter 5 presents an intelligent application to data
viii Preface
science stating safe social distance monitoring and face mask detection mechanism to con-
trol the ­
COVID-19 spread. ­
Chapter 6 presents another intelligent application of data sci-
ence that describes ­
real-time virtual fitness tracker and exercise posture correction. This
mechanism implements the concept of machine learning. ­
Chapter 7 discusses the role of
data science in revolutionizing the healthcare sector, yet another intelligent application of
data science. ­
Chapter 8 presents another intelligent application of data science stating the
­
early-stage detection of chronic kidney disease. For development of this application, artifi-
cial intelligence techniques are experimented. ­
Multi-optimal deep learning technique for
detection and classification of breast cancer from histopathological images is presented in
­
Chapter 9. ­
Chapter 10 talks about influence of lifestyle on pregnancy complications. This is
conducted in ­
real-world environment, and the results are presented. ­
Chapter 11 proposes
a research paper recommendation system. ­
Chapter 12 shows an analysis and prediction of
crime rate against women using classification and regression. ­
Chapter 13 talks about data
analysis for technical business incubation performance improvement. ­
Chapter 14 shows
satellite ­
imagery-based wildfire detection. This chapter demonstrates the usage of deep
learning techniques with image data for detection. ­
Chapter 15 demonstrates low resource
language document summarization. ­
Chapter 16 describes eclectic analysis of classifiers for
fake news detection. ­
Chapter 17 talks about data science and machine learning applica-
tions for mental health. Finally, ­
Chapter 18 presents the contribution by authors in artifi-
cial intelligence stating analysis of ancient and modern meditation techniques on human
mind and body and its effectiveness in the ­
COVID-19 pandemic.
We would like to express our sincere thanks and gratitude to the authors for their out-
standing contributions. Without their contributions, it would have become very difficult
to submit this proposal. As the editors, we hope that this book will help and stimulate the
researchers to pursue research in this domain. Also, we expect that the researchers will
get benefited from the huge set of applications stated in this book. Special thanks go to our
publisher, CRC Press/­
Taylor and Francis group.
We hope that this book will present promising ideas and outstanding research contribu-
tions and support further development of data science and intelligent application.
ix
Editors
Dr. Pallavi Vijay Chavan 
is Associate Professor at Ramrao
Adik Institute of Technology, D Y Patil Deemed to be
University, Navi Mumbai, Maharashtra, India. She has been
in academic for 16years, working in the area of data science
and network security. In her academic journey, she published
research work in the data science and security domain with
reputable publishers including Springer, Elsevier, CRC Press
and Inderscience. She has published 1 book, 7+ book chap-
ters, 10+ international journal papers and 30+ international
conference papers. Presently she is guiding five Ph.D.
research scholars working in a similar domain. She com-
pleted her Ph.D. from Rashtrasant Tukadoji Maharaj Nagpur University, Nagpur,
Maharashtra, India, in 2017. She secured the first merit position at Nagpur University for
the degree of B.E. in Computer Engineering in 2003. She is the recipient of research grants
from UGC, CSIR and the University of Mumbai. She is a reviewer for Elsevier and
Inderscience journals. Her firm belief is “­
Teaching is a mission”.
Dr. Parikshit N Mahalle 
is a senior member of IEEE and
a Professor and Head of the Department of Artificial
Intelligence and Data Science at Vishwakarma Institute of
Information Technology, Pune, India. He completed his
Ph.D. from Aalborg University, Denmark, and continued as
a postdoctoral researcher. He has 21+ years of teaching and
research experience. He is a member of Board of Studies in
Computer Engineering, Savitri Bai Phule University and var-
ious universities. He has 9 patents and 200+ research publi-
cations (­
citations: 1830, ­
H-index:18) and authored/­
edited 30+
books with Springer, CRC Press, Cambridge University Press,
etc. He is ­
editor-­
in-chief for IGI ­
Global – International Journal of Rough Sets and Data Analysis,
Associate Editor for IGI ­
Global – International Journal of Synthetic Emotions, ­Inter-­science –
International Journal of Grid and Utility Computing, and member of Editorial Review Board
for IGI ­
Global – International Journal of Ambient Computing and Intelligence. His research
interests are Algorithms, Internet of Things, Identity Management and Security. He has
delivered more than 200 lectures at national and international levels.
x Editors
Dr. Ramchandra Mangrulkar, 
a ­
post-graduate from the
National Institute of Technology, Rourkela, received his
Ph.D. in computer science and engineering from SGB
Amravati University, Amravati in 2016. At present, he is
working as an Associate Professor in the Department of
Computer Engineering at SVKM’s Dwarkadas J. Sanghvi
College of Engineering, Mumbai (­
autonomous college affili-
ated to the University of Mumbai), Maharashtra, India. Dr.
Ramchandra Mangrulkar has published 50 papers and 24
book chapters with Taylor and Francis, Springer and IGI
Global in the field of interest. He has presented 54 papers in
national and international conferences. He has also edited
five books with CRC Press, Taylor and Francis Series. He has
also worked as a reviewer in many international journals and also reviewed many book
proposals submitted to publishers of an international reputation. He has also chaired
many sessions in national and international conferences. He has also shown keen interest
in conducting and organizing workshops on Artificial Intelligence Bot in Education,
Network Simulator 2, Innovative tools for Research and LaTeX  Overleaf. He has also
received certification of appreciation from DIG Special Crime Branch Pune and
Superintendent of Police and broadcasting media gives wide publicity for the project work
guided by him on the topic “­
Face Recognition System”. He has also received a ­
grant-­
in-aid
of rupees three lacs and fifty thousand under the Research Promotion Scheme of AICTE,
New Delhi for the project “­
Secured Energy Efficient Routing Protocol for Delay Tolerant
Hybrid Network”. He is also working as an internal thesis advisor at NMIMS’s MPSTE
Mumbai and DY Patil’s RAIT, Navi Mumbai. He also worked as an external referee for
Ph.D. thesis evaluation at SGB Amravati University and RTM Nagpur University. He is an
active member of the Board of Studies in various universities and autonomous institutes
in India.
Dr. Idongesit Williams 
is an Assistant Professor at Aalborg
University Copenhagen. He holds a bachelor’s degree in
Physics, a master’s degree in Information and Communications
TechnologiesandaPh.D.Hisresearchareasare­socio-economic,
­
socio-technical related to Information and Communications
Technologies. His research areas include the following. The
facilitation of telecom and ICT infrastructure using ­
public–
private partnerships; the development and the sustenance of
­
community-based networks, ­
e-government implementation;
science and technology studies; gender adoption of ICTs; organizational adoption of ICTs;
and user experience with ICTs and organizational learning. He has authored more than 60
research publications, including journal papers, books, book chapters, conference papers
and magazine articles. He is the ­
co-editor of the book The African Mobile Story. He has
delivered presentations at conferences and also helped in organizing conferences such as
the CMI annual conference and the CMI/­
GTUC conferences.
xi
Contributors
Pallavi Vijay Chavan
Department of Information Technology
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
Shalaka Prasad Deore
Department of Computer Engineering
M.E.S. College of Engineering, Pune, S.P.
Pune University
Pune, India
Himani Deshpande
Department of Computer Engineering
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
Pranjali Deshpande
Department of Computer Engineering
MKSSS’s Cummins College of Engineering
for Women
Pune, India
Prarthana Dhok
Department of Information Technology
Ramrao Adik Institute of Technology
Navi Mumbai, India
Sunil Dhore
Department of Computer Engineering
Army Institute of Technology
Pune, India
Yash Goda
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Nitin Goje
Webster University
Tashkent, Uzbekistan
Vijay S. Gulhane
Department of Information Technology
Sipna College of Engineering 
Technology
Amravati, India
Baisa L. Gunjal
Department of Information Technology
Amrutvahini College of Engineering
Sangamner, India
Nikhil Ingale
Department of Information Technology
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
Sunita Jahirabadkar
Department of Computer Engineering
MKSSS’s Cummins College of Engineering
for Women
Pune, India
Pruthav Jhaveri
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Tejas Kachare
Department of Information Technology
Vishwakarma Institute of Technology
Pune, India
Piyush Kadam
Department of Information Technology
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
Aryan Kakade
Department of Computer Science
Sinhgad College of Engineering
Pune, India
xii Contributors
Siddharth Kakade
Department of Computer Science
Vishwakarma Institute of Technology
Pune, India
Anant Kaulage
Army Institute of Technology
Pune, India
Anindita A. Khade
Department of Computer Engineering
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
and
SIES Graduate School of Technology
Navi Mumbai, India
Vatsal Khandor
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Dhruvi Khankhoje
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Prateek Koul
Department of Information Technology
Ramrao Adik Institute of Technology
Navi Mumbai, India
Pankaj Kulkarni
Application Architect
Deloitte Service LLP
Franklin, Tennessee
Pournima Kulkarni
Lead Quality Analyst
Healthstream, Inc.
Nashville, Tennessee
Jyoti Kundale
Department of Information Technology
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
Abhijeet Kushwah
Department of Information Technology
Ramrao Adik Institute of Technology
Navi Mumbai, India
Pravin Malve
Department of Computer Engineering
Government Polytechnic
Wardha, India
Deepa Mane
IT Department
Smt. Kashibai Navale College of Engineering
Pune, India
Ramchandra Mangrulkar
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Yashsingh Manral
Department of Information Technology
Ramrao Adik Institute of Technology
Navi Mumbai, India
Siddharth Nigade
Department of Computer Science
Vishwakarma Institute of Technology
Pune, India
Tirth Pandya
Department of Information Technology
Ramrao Adik Institute of Technology
Navi Mumbai, India
Leena Ragha
Department of Computer Engineering
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
xiii
Contributors
Onkar Rane
Department of Information Technology
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
Sagar Rane
Army Institute of Technology
Pune, India
Naitik Rathod
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Priyali Sakhare
Department of Information Technology
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
Ashwini Sarode
Department of Applied Computer Science
SRH Heidelberg University
Heidelberg, Germany
Krutik Shah
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Nemil Shah
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Yashvi Shah
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Nitin Shekapure
Production Engineering Department
All India Shri Shivaji Memorial Society
College of Engineering
Pune, India
Swati Shekapure
Computer Engineering Department
Marathwada Mitra Mandal’s College of
Engineering
Pune, India
Narendra M. Shekokar
Computer Engineering Department
Dwarkadas J. Sanghvi College of
Engineering
Mumbai, India
Sachin Shelke
IT Department
Pune Institute of Computer Technology
Pune, India
Vishal Shrivastava
CRG Solutions Singapore Pte Ltd.
Bangalore, India
Vivek Kumar Singh
Department of CSE
Bharat Institute of Engineering and
Technology
Hyderabad, India
Manisha Sinha
Department of Electronics and
Communication Engineering
University of Engineering and
Management Jaipur
Jaipur, India
P. Tamilarasi
Department of Computer Science
Sri Sarada College for Women
(­Autonomous)
Salem, India
xiv Contributors
R. Uma Rani
Sri Sarada College for Women
(­Autonomous)
Salem, India
Siddhesh Unhavane
University of Illinois at
­Urbana-Champaign
School of Information Sciences
Champaign, Illinois
and
Department of Information Technology
Ramrao Adik Institute of Technology
Navi Mumbai, India
Amarsinh V. Vidhate
Department of Computer Engineering
Ramrao Adik Institute of Technology
D Y Patil Deemed to be University
Navi Mumbai, India
Shreeraj Vijayan
PublicisSapient
Toronto, Canada
1
DOI: 10.1201/9781003283249-1
1
Instigation and Development of Data Science
Priyali Sakhare and Pallavi Vijay Chavan
Ramrao Adik Institute of Technology, D.Y. Patil Deemed to be University Nerul, Navi Mumbai
Pournima Kulkarni
Healthstream, Inc.
Ashwini Sarode
SRH Heidelberg University
1.1 Data Science
Data is the basis or groundwork of data science; it is the matter on which all the analyses are
based on processing. In the context of data science, data can be bifurcated into two types:
traditional and big data.
CONTENTS
1.1 Data Science.............................................................................................................................1
1.1.1 Existence of Data Science...........................................................................................3
1.1.2 Data Science Process...................................................................................................5
1.1.2.1 Setting the Research Goal...........................................................................6
1.1.2.2 Retrieving Data............................................................................................7
1.1.2.3 Data Preparation..........................................................................................8
1.1.2.4 Data Exploration......................................................................................... 10
1.1.2.5 Data Modeling............................................................................................ 11
1.1.2.6 Presentation and Automation..................................................................12
1.1.3 Life ­
Cycle – Data Science.........................................................................................12
1.2 Relation between Data Science and Machine Learning..................................................13
1.2.1 Where Do We See Machine Learning in Data Science?......................................13
1.2.2 Which Machine Algorithms are used in Data Science?...................................... 14
1.2.2.1 Linear Regression Algorithm................................................................... 14
1.2.2.2 Decision Tree..............................................................................................15
1.2.2.3	­K-Means Clustering...................................................................................15
1.2.3 Application of Machine Learning in Data Science..............................................15
1.3 Tools for Data science...........................................................................................................17
1.3.1 R Programming........................................................................................................17
1.3.2 Python........................................................................................................................17
1.4 Benefits and Applications....................................................................................................18
1.5 Conclusion.............................................................................................................................18
References........................................................................................................................................18
2 Data Science
Traditional Data: In the context of data science, traditional data is in a structured format
and can be stored in databases. It is mainly maintained by all types of businesses start-
ing from very small to big organizations. Traditional data is stored in relational database
management systems (­­Figure 1.1).
For traditional data to be processed, all data goes through the preprocessing stage. This
operation is necessary where the raw data is converted into a format which is more under-
standable. Few processes are undertaken for traditional data. They are as follows:
i. Collection of raw data and storing it on a server
ii. Class labeling
iii. Data cleansing
iv. Data balancing
v. Data shuffling
Big Data: In the context of data science, big data is larger than traditional data where it
deals with complex datasets which is difficult to be managed in traditional data. Big data
is usually distributed across a large network which varies by variety (­
number, text, audio,
video), velocity (­
retrieved, computed), and volume (­­
tera-, ­
peta-, exabytes) (­­
Figure 1.2).
In big data, as the complexity of the data is very large, the preprocessing stage becomes
crucial. Some of the stages of big data are quite similar to the traditional data [1,2]. The
process for big data is as follows:
i. Collection of the data
ii. Class labeling
iii. Data cleansing
iv. Data masking
­FIGURE 1.1
Traditional data processing.
3
Instigation and Development
In data science, the term “­
science” means the knowledge which is gained through system-
atic study. Basically, it is a systematic plan that builds and organizes the knowledge which
is in a form that is easily testable in explanations and predictions [3].
1.1.1 
Existence of Data Science
Data science is a combination of two or more fields that uses different kinds of math and
statistics, scientific methods, specialized programming, artificial intelligence, data analy-
sis, algorithms, and systems for the extraction of knowledge from the data. The data can
be structured and unstructured. The data science field prepares data for analysis, expands
data science problems, makes ­
data-driven solutions, analyzes data, and searches the
­
high-level decisions in a broad range of application domains. Data science also is related
to big data, machine learning, data mining, etc. It is a study of the massive amount of data
that includes extraction of meaningful insight from structured and unstructured data
which is completed using different algorithms and scientific methods [4].
­
Figure 1.3 shows the diagram of data science. Data science is the discovery of knowledge
through the analysis of data which is the statistics extension that is capable of dealing with
huge amounts of data. In data science, the past data is being analyzed for prediction of
future analysis. Data science usually works with dynamic unstructured data. The skills
required for data science are statistics, visualization, and machine learning. Data science
has several current viewpoints. They are as follows:
i. Data science is about studying scientific and business data.
ii. Data science is an integration of computing technology, statistics, and artificial
intelligence.
iii. The purpose of data science is to solve scientific as well as business problems by
the extraction of knowledge from data [5,6].
­FIGURE 1.2
Big data processing.
4 Data Science
Data science uses powerful hardware, programming systems, and desired algorithms to
solve the issues that arise in data. Data can be analyzed in a simple form. Refer to ­
Figure 1.4
for a few simple steps of data science.
In short, data science is all about:
i. Analyzing and questioning the raw data.
ii. Modeling the data using the desired algorithm.
iii. Visualize the data to achieve correct outcomes.
iv.	­
Decision-making and finding the final results [7].
­FIGURE 1.3
Existence of data science.
­FIGURE 1.4
Simple steps of data science.
5
Instigation and Development
1.1.2 
Data Science Process
Data science is a multidisciplinary field, and it allows us to extract information from both
structured and unstructured data. Data science helps us to raise the chances of success in
data science projects at the lowest price by using the standard approach. It also makes it
possible to take a project by making teams, with each team member focusing on a particu-
lar center of interest areas [8].
The data science process passes through various phases. The data science process con-
sists of six important steps (­­
Figure 1.5).
This summarizes the process of data science and the following list is a short introduc-
tion to the process, each step will be discussed in depth further:
i. Setting the Research Goal: The first phase of the process is setting a research
goal. The main aim of this is to make sure that all the people of the company
understand the purpose of the project.
ii. Retrieving Data: The second phase of data science is used for retrieving data
where data is available for analysis purposes, and it also includes searching for
suitable data and getting access to data from the owner of the data. The data will
be in the raw form which needs to be polished and transformed before it becomes
usable.
iii. Data Preparation: After processing the data in the second phase, i.e., retrieving
data, we get raw data, so it’s time to prepare it. In the third phase of data science,
data preparation is undertaken for processing this raw data. This includes the
transformation of raw data into data that would be directly usable. For achieving
the usable form of data, we have to detect and correct various errors of data and
combine data from different data sources. Then it transforms those data. Once the
data preparation is successfully completed, one can progress to data visualization
and modeling.
iv. Data Exploration: The fourth phase of the data science process is data exploration
where data is explored in a deep understandable form. This phase will enable us
to start the modeling phase.
v. Data Modeling: The fifth phase of the data science process is data modeling which
is also known as model building. Here, we have to select a technique from differ-
ent fields like machine learning, statistics, and operation search. Data modeling is
the iterative process which involves a selection of different variables for a model,
execution of the model, and diagnosis of the model.
­FIGURE 1.5
Process of data science.
6 Data Science
vi. Presentation and Automation: The last phase of the data science process is pre-
sentation and automation. This phase is processed, only if needed. The impor-
tance of this step is more clearly visible or understood in projects on a tactical and
strategic scale. Some projects need to be performed again so that automating the
project will save time [9].
Following these six steps pays off, higher project success rate and also increased research
results. This process makes sure that we have a ­
well-designed research plan, clear deliv-
erables, and a good understanding of the business query before you even start looking at
data. Now, we will look into the process in detail.
1.1.2.1 
Setting the Research Goal
Every project starts by understanding the purpose of why, what, and how the project is
established. Refer to ­
Figure 1.6 for understanding the setting of the research goal.
For defining research goals and creating a project charter, we need to understand what
does the company expects from us? And why does management place such a value on
research? And is it part of a bigger picture of a project originating from an opportunity
someone detected? So, answering these what, why, and how is the goal of the first phase
of the process.
This phase aims to have a clear research goal, a good understanding of contents, and
proper planning of action with a timetable. This can help to create the project charter.
Creating a project charter refers to the short formal document with a clear objective state-
ment of the project.
1.1.2.1.1 
Spend Time Understanding the Goals of Research
Understanding the research goals is an essential outcome which usually states the purpose
of the project in a focused and clear manner. This would be slightly critical for the success
of the project. Constantly querying until the doubts are resolved and until the expecta-
tions of the project are cleared, it takes months’ time to resolve the research problems.
1.1.2.1.2 
Creation of Project Charter
In the project charter, clients (­
users) want to know about what they are paying for, so we
need to efficiently understand the problem and need to try to get an agreement on the
deliverables. The project charter is mandatory for creating any project.
­FIGURE 1.6
Setting the research goal.
7
Instigation and Development
A project charter needs teamwork and the inputs should consist of the following points:
i. Research goal should be clear.
ii. Project mission should be clear.
iii. What resources are going to be used?
iv. How is the project going to be analyzed?
v. Project should be deliverable and successful.
vi. Project should be completed in the timeline.
Clients can use all of this information so that they can make the estimation of cost for
the project and the people who are specialized are required for that project to make it
successful.
1.1.2.2 Retrieving Data
After setting the research goals, we need to collect the data. In this phase, we have to
ensure whether there is sufficient quality of data and there is proper access to the data for
further use of it. The data can be in various forms, i.e., it can be from an excel spreadsheet
to many different types of databases.
­
Figure 1.7 shows the retrieval of data.
In this phase, the required data is retrieved for processing. Occasionally, we need to
design the collection of data by ourselves, but mostly we won’t get involved in this phase.
Many companies already have collections of the data that are stored for us, but in rare
cases if data is not available then it can be bought from third parties.
­FIGURE 1.7
Retrieving data.
8 Data Science
1.1.2.2.1 
Data Stored within the Company
Firstly, we need to check whether there is access for the data to the company. Then, we
have to check the quality of the data that is available in the company. Many companies
have the habit of keeping the key data, so cleaning of data can be already done. Mainly,
the data can be stored in data warehouses, data marts, databases, etc. Data warehouse is a
system where it combines the data from different sources into a central repository to store
the data and support data mining, machine learning, and business intelligence. Data mart
is a subset of a data warehouse where it focuses on a specific area which only allows the
authorized user to quickly access critical data without wasting time over finding through
an entire data warehouse. A database is used to store the data.
Sometimes finding the data within the company becomes more challenging. As the
company grows, the data gets scattered in many different places. Due to changes in the
position of people in the company, many of them leave the company, so the knowledge of
data may disappear. So, we need to develop some skills to recover the lost data. Another
difficult task is to get proper access to data. To get access to the data is ­
time-consuming,
and company politics are also involved in it.
1.1.2.2.2 
Don’t Be Afraid of Buying Outside Data
If the data is not available in the company, one can look outside your organization for buy-
ing it. The outside data can be of absolute quality, and it relies on the organization that
creates and manages it. There are many ­
open-data providers, as shown in ­
Table 1.1.
1.1.2.2.3 
Check the Quality of Data to Avoid Problems
The retrieval of data is the first time where data is seen in the process of data science. Most
errors are encountered while gathering information and are easily spotted. But it may take
many hours to solve these errors. So, checking the quality of data is very important for
avoiding problems.
1.1.2.3 Data Preparation
In the data preparation phase, we will look at how the data quality is enhanced for further
processing.
The data preparation phase is divided into three ­
sub-phases: data cleansing, data inte-
gration, and data transformation (­­
Figure 1.8).
Similar actions have been taken during the data cleansing, transformation, and integra-
tion phases (­­
Figure 1.9). This looks a bit abstract to understand, but it is easier to learn
once we will see each in detail. Now, we will learn detailed information about these
­sub-phases.
­TABLE 1.1
­Open-Data Providers
Open Data Site Information
Freebase.org An open database that retrieves its information from sites like Wikipedia and
MusicBrainz.
Aiddata.org Open data for international development.
­open-data.europa.eu The home of the European Commission’s open data.
9
Instigation and Development
1.1.2.3.1 Data Cleansing
Data cleansing is a ­
sub-process of the data preparation phase in the process of data science
which focuses on removing the errors from the data that becomes a consistent and true
representation of the process from which it originated. Errors are generated in two types:
interpretation error and inconsistency error. Interpretation error occurs when we take the
value of data for granted, whereas inconsistency error occurs when there are many incon-
sistencies between the data sources or against the company’s standard value.
Different types of errors can be detected by an easy check method. ­
Table 1.2 represents
the overview of common errors which are detected commonly. We need to solve the prob-
lem as soon as possible in the acquisition chain or else we need to fix it in the program.
Later on, it will cause issues during further execution.
­FIGURE 1.8
Data preparation.
­FIGURE 1.9
Common actions taken during data cleansing, data transformation, and data integration.
10 Data Science
1.1.2.3.2 Data Integration
Data integration is the process of merging various data from different sources into a single
source for efficient data management. The data comes from several places and are inte-
grated into one source. Data may vary in structure, size, and type ranging from various
excel files and databases to text documents. We can create views and can also set operators
in it.
There are different ways of combining data. Two majorly used combine operations are
joining and tacking. The first is joining which allows us to combine the information from
one table with the information that we find in another table. The second is appending
effectively adding observations from one table to another table.
1.1.2.3.3 Data Transformation
Some models require the data to be in different shapes. Now, as we have ­
well-cleaned
and integrated the data, this is the next task that we have to perform. Data transformation
helps us in transforming the data so that it can take a suitable form for modeling the data.
For transforming the data, we need to reduce the number of variables and we can also
turn the variables into dummy variables. Many times we have many variables that need
to be reduced as they don’t add any new information. If there are many variables in the
model, it will become difficult for handling, and some techniques don’t perform well when
they are overloaded with many input variables. So, it is very important to reduce the num-
ber of variables. Meanwhile, we can also turn the variables into dummy variables where
we can assign dummy values as 1 or 0, i.e., true=1 and false=0. Turning variables into
dummies splits a variable that has multiple classes into multiple variables.
1.1.2.4 Data Exploration
Data exploration is the fourth phase of the data science process. It is all about building
a deep understanding of the data. It helps us to understand the interaction of variables
within each other. For achieving this, we need to use visual techniques and statistics.
Exploratory data analysis is another name for data exploration.
During data exploratory, we take a deep look into the data (­­
Figure 1.10).
We usually understand the information when it is in the picture or image form. Therefore,
we use graphical techniques for understanding the data and the interactions between vari-
ables. The visualization techniques we can use are simple graphs or combined graphs
represented by connections between them.
­TABLE 1.2
Overview of Common Errors
Error Description Possible Solution
Errors Pointing to False Values within Single Data Set
Impossible values Manual overrules
Missing values Remove values
Redundant white space Usage of string functions
Errors Pointing to Inconsistencies between Data Sets
Different units of measurements Recalculation
Different levels aggregation The same level of measurement by aggregation or extrapolation
11
Instigation and Development
1.1.2.5 Data Modeling
Data modeling is the fifth phase in the data science process which uses domain knowl-
edge about the data that is found performing previous phases. In this phase, we have to
choose a technique from the fields of operations research, machine learning, and statistics.
Building a model is an iterative process which includes selecting a variable for the model
then executing it and finally diagnosing the model.
As per ­
Figure 1.11, data modeling is further divided into the following ways:
i. Model and variable selection
ii. Model execution
iii. Model diagnostic and model comparison
1.1.2.5.1 
Model and Variable Selection
Here, we need to choose the variable that we want to add to the model. There are many mod-
eling techniques available, so choosing the right model is up to us. We have to consider the
model performance and whether the project also meets all the requirements for the model.
1.1.2.5.2 Model Execution
Once we have selected the model and suitable variables for it, we will have to execute it.
So, model execution is done for implementation of the code. Different software platforms
­FIGURE 1.10
Data exploration.
FIGURE 1.11
Data modeling.
12 Data Science
can be used for execution of the code. We can use python, R programming, and various
suitable programming languages for implementation of models.
1.1.2.5.3 
Model Diagnostic and Model Comparison
In this, we will be making multiple models from which we can then select the best one
based on multiple criteria. Many various models make high assumptions for freedom of
the inputs, and we have to verify that these assumptions meet the expectations. This is
known as model diagnostics.
1.1.2.6 
Presentation and Automation
After successful analysis of data and building a ­
well-performing model, the project is
ready to be presented in front of the external people.
Presentation and automation are the last stages of the data science process where the
soft skills are most useful and are extremely important for the presentation of the project
(­­
Figure 1.12). Automating data analysis is used for automating the projects that are needed
to be executed again and again [9].
1.1.3 
Life ­
Cycle – Data Science
Life cycle of data science is also known as the data science pipeline. Life cycle of data sci-
ence consists of five important stages.
­
Figure 1.13 shows the different stages of the life cycle of data science as follows:
i. Capture: The structured and unstructured data which are raw are gathered from
all the relevant sources to capture data from devices in real time.
ii. Prepare and Maintain: This stage inculcates the raw data into the format which is
consistent for machine learning and analytics.
iii. Process: Data scientist examines the patterns and range of values within the data
to determine whether the data is sustainable for use with machine learning or
predictive analytics.
iv. Analyze: Discovery is done in this stage where the data scientist does the actual
performance of machine learning, regression, and deep learning algorithms from
the ­
well-prepared data.
FIGURE 1.12
Presentation and automation.
13
Instigation and Development
v. Communicate: Finally, the projects are presented as charts, reports, and in the form
of other data visualizations, which is easy for ­
decision-makers to understand [10].
1.2 
Relation between Data Science and Machine Learning
Machine learning gives the ability to computers without being explicitly programmed. It
is also referred to as a process in which a computer can work accurately, as it collects and
learns from the data it is given.
Data science and machine learning are related to each other but have different goals and
functionalities. Machine learning is a critical part of data science where it effectively uses
different statistical algorithms for analyzing the data from multiple resources [9,11].
1.2.1 
Where Do We See Machine Learning in Data Science?
Although machine learning is mainly linked to the ­
data-modeling step of the data science
process, it can be also used in most of the steps in the data science process (­­
Figure 1.14).
Machine learning is used in the data science process. The ­
data-modeling phase cannot
be started until we understand the qualitative raw data. But before that, the data prepara-
tion phase have significant benefits through machine learning. If we take an example for
cleansing the text strings, machine learning helps group the same strings so that it can
become easy to correct spelling errors.
FIGURE 1.13
Life cycle of data science.
14 Data Science
Machine learning is also helpful in exploring the data. Algorithms can recognize the
underlined patterns in the data and are represented through various visualization tools
including pie charts, box plots, histograms, density plots, bar charts and line graphs [9].
1.2.2 
Which Machine Algorithms are used in Data Science?
Data scientists should be mandatorily aware of machine learning and its algorithms as
many machine learning algorithms are broadly used in data science. Following are the
names of machine learning algorithms that are mostly used in data science:
i. Linear regression algorithm
ii. Decision tree
iii.	­K-means clustering
1.2.2.1 
Linear Regression Algorithm
Linear regression algorithm is a popular algorithm that is used in machine learning which
is based on supervised learning. This algorithm is mostly used for prediction and forecast-
ing purposes. Linear regression algorithm works on simple regression. Regression is the
method of modeling the target values which is based on independent variables. It helps in
representing the linear equation, which has the connection between the input set and pre-
dictive output. Since it determines the linear relationship between input/­
output variables,
it is known as linear regression.
As per ­
Figure 1.15, equation for linear regression is
,
= +
Y mx c
where Y is the dependent variable, x is the independent variable, m is the slope, and c is
the intercept.
FIGURE 1.14
Machine learning used in process of data science.
15
Instigation and Development
1.2.2.2 Decision Tree
Decision tree is based on supervised learning. This machine learning algorithm is
­
well-known because it can be used in both classification and regression problems.
In the decision tree algorithm, we can resolve the issue by using tree representation in
which each node represents features, each branch shows a decision, and each leaf repre-
sents the outcomes (­­
Figure 1.16). In the decision tree, we start searching from the root node
of the tree and compare the values of root with the record attribute. We continue compar-
ing the nodes and lastly, we come to the outcome that is to the leaf node.
1.2.2.3 ­K-Means Clustering
­
K-means clustering is the machine learning algorithm which belongs to the unsuper-
vised learning algorithm. ­
K-means clustering solves the problems related to the cluster-
ing. Clustering is referred to as a small set of data which are formed by breaking a large
dataset. This small set of data is known as clusters. The main aim of this algorithm is to
minimize the sum of distances between the clusters and data points [7,12]. ­
Figure 1.17
shows ­
k-means clustering.
1.2.3 
Application of Machine Learning in Data Science
Machine learning algorithms have wide range of applications with data science, some of
the application domains are listed below [9]:
FIGURE 1.15
Linear regression algorithm.
16 Data Science
i. Automation and control
ii. Education
iii. Finance industry
iv. Computer vision and Intelligence
v. Customer segmentation
FIGURE 1.16
Decision tree.
FIGURE 1.17
K-means clustering.
17
Instigation and Development
vi. Weather forecasting
vii. Disease prediction mechanism
viii. Price predictions in real estate
1.3 
Tools for Data science
Data scientists use many various types of tools for performing data science. Data scientists
should build and run code for the creation of models. There are many ­
open-source tools
available for performing these operations. Data science tools are of many different types,
namely data analysis tools, data visualization tools, data warehousing, and machine learn-
ing tools. Data science has various types of tools.
i. Data analysis ­
tools – R, Python, MATLAB, Excel, Jupyter.
ii. Data visualization ­
tools – Tableau, Cognos.
iii. Data warehousing ­
tools – SQL, ETL, Hadoop, AWS Redshift.
iv. Machine learning ­
tools – Spark, Mahout [7,11].
1.3.1 R Programming
R is the data analysis tool used for performing the data science operation. R is an ­
open-
source programming language and has a suitable environment for developing graphics
and computing. R is the most popularly used programming language among the data
scientist community. R provides a variety of tools and libraries for creating visualizations,
cleansing data, and evaluating machine learning and deep learning algorithms. R also
offers multiple features for making statistical analysis of large data sets, which is easy for
performing various tasks that are mentioned below:
i. Linear and ­
non-linear modeling.
ii.	­Time-series analysis.
iii. Clustering.
R Language has an integrated development environment which is available in RStudio.
Packages available in R language CARET, random forest, e10171, Rpart, and many more.
Data science projects that uses R programming are Twitter, ­T-Mobile, Google Analytics, etc.
1.3.2 Python
Python is an ­
open-source, ­
object-oriented, ­
high-level, and interpreted programming lan-
guage that emphasizes code readability. It is the data analysis tool which is used in data
science. Python helps in developing strong data applications. The main reason why python
is widely used in the scientific and research communities is that it’s easy to use and there
is quite a simple syntax to adapt for people who do not have an idea or are not from an
engineering background.
18 Data Science
A large number of python libraries support data science tasks, including Pandas for data
manipulation, NumPy for handling large dimensional arrays, Matplotlib for building data
visualizations, and SciPy for providing functionalities for mathematics and computing [13].
1.4 
Benefits and Applications
i. Data science helps drive innovation by enabling teams to share results, codes, and
reports, and reduces redundancy.
ii. It helps us to understand the project in an efficient and easy manner.
iii. It makes data scientists more productive by helping them to accelerate and deliver
models extra faster with less error.
iv. It facilitates data scientists to work with large volumes and large varieties of data
[11,14,15].
The following are the trending applications:
i. Data science is used for image and speech recognition (­
e.g., Siri, Alexa, Cortana).
ii. Data science is used in healthcare for detection of tumors, medical image analysis,
and recognizing other diseases.
iii. Data science is also used for detection of risk in finance industries.
iv. Data science is used for designing ­
self-driving cars where automation is done
within the car so that there would be a reduction in the number of accidents [7,16].
1.5 Conclusion
The aim of this chapter is to understand the deep theory of data science. The authors have
explained the process of data science with respect to setting the goals, retrieving data,
data preparation, data exploration, data modeling, presentation, and automation. Further,
the data science life cycle is described, and this life cycle is also known as the data sci-
ence pipeline. In the data science lifeline, there are six stages: capture, prepare, maintain,
process, analyze, and communicate. The authors have also introduced the relationship
between data science and machine learning. Machine learning is used in the process for
solving data science problems in different ways by using certain algorithms. The authors
have also given detailed information about tools that are used for performing data science
operations. Finally, the author has explained large benefits of data science in various man-
ners and applications of data science.
References
1. https://­www.kdnuggets.com/­2018/­06/­­what-­where-­how-­data-science.html.
19
Instigation and Development
2. Haruki Nakamura. Big data science at AMED-BINDS. Japan Biophysical Review, 12(­
2): 221–224.
2020. doi: 10.1007/­­s12551-­020-­00628-1.
3. https://­cacm.acm.org/­magazines/­2013/­12/­­169933-­data-­science-­and-prediction/­fulltext.
4. Dataflair Team. (­
2019). What is data science? Complete data science tutorial for beginners.
Retrieved 8.10.2019.
5. Alisa Bilal. Data Science: Fundamental Principles, Croatia, 2019.
6. Yangyong Zhu and Yun Xiong. Defining Data Science. Beyond the Study of the Rules of the
Natural World as Reflected by Data, School of Computer Science, Fudan University, Shanghai,
China Shanghai Key Laboratory of Data Science, Fudan University, China.
7. https://­
www.javatpoint.com/­
data-science. What is Data Science: Tutorial, Components, Life
cycle.
8. Wikipedia. https://­en.wikipedia.org/­wiki/­Data_science.
9. Davy Cielen, Arno D. B. Meysman, Mohamed Ali. Introducing Data Science. Big Data, Machine
Learning, and More, Using Python Tools. Manning Publications.
10. https://­www.ibm.com/­cloud/­learn/­data-science-introduction. IBM Cloud Learn Hub.
11. https://­www.oracle.com/­in/­data-science/­what-is-data-science/. Oracle India.
12. Mine Çetinkaya-Rundel, Victoria Ellison. A Fresh Look at Introductory Data Science. Journal of
Statistics and Data Science Education, 29(­
sup1), 2021.
13. Analysis of US Population using Data Analytics and Data Science Tools. Computer Science and
Information Technology, 9(­
7), 2020.
14. Steven Perkins, Hazel Davis, Valerie du Preez. “­
Practical Data Science for Actuarial Task”. A
practical example of data science considerations by Modelling, Analytics and Insights in Data
working ­
party – New approaches to current actuarial work. 2020.
15. Harshil Jhaveri, Himanshu Ashar and Ramchandra Mangrulkar. Leveraging Data Science in
Cyber Physical Systems to Overcome Covid-19. Journal of University of Shanghai for Science and
Technology, 22(­
10): 1993–2015, 2020.
16. Sanyukta Shrestha. A Deep Dissertation of Data Science: Related Issues and its Applications.
Department of Information Technology, Amity University Uttar Pradesh, Noida, IEEE, 2019.
Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan
21
DOI: 10.1201/9781003283249-2
2
Role of Statistical Methods in Data Science
Deepa Mane and Sachin Shelke
SPPU
2.1 Introduction
Data science as a ­
research-oriented discipline is majorly impacted by IR, computer sci-
ence, mathematics, statistics, operations research, and the applied sciences too. Though
data science was derived by statisticians, the significance of computer science and its role
in business applications is often much more elaborated, particularly in the century of big
data. John Tukey already changed the focus of statistics entirely from mathematics, statis-
tical testing, to deriving hypotheses from data that try to understand the data before get-
ting hypotheses. One more data mining technique is Knowledge Discovery in Databases
(­
KDD) with its subtypes in data mining which is the knowledge data discovery combines
CONTENTS
2.1 Introduction...........................................................................................................................21
2.2 Data Science and Statistics Terminologies........................................................................22
2.3 Types of Statistics..................................................................................................................22
2.3.1 Descriptive.................................................................................................................23
2.3.2 Inferential...................................................................................................................23
2.4 How to Describe a Single Set of Data................................................................................23
2.5 Statistical Analysis................................................................................................................24
2.5.1 Quantitative Analysis...............................................................................................24
2.5.2 Qualitative Analysis.................................................................................................24
2.5.3 Measures of the Central Tendency.........................................................................24
2.5.4 Measures of Dispersion...........................................................................................25
2.6 Tools to Measure Relationships..........................................................................................26
2.6.1 Covariance.................................................................................................................26
2.6.2 Correlation.................................................................................................................26
2.7 Probability Distribution Function......................................................................................27
2.7.1 Cumulative Density Function.................................................................................27
2.7.2 Continuous Data Distributions...............................................................................28
2.7.3 Conditional Probability...........................................................................................28
2.7.4 Bayes’ Theorem.........................................................................................................29
2.8 Hypothesis Testing...............................................................................................................29
2.9 Conclusion.............................................................................................................................30
References........................................................................................................................................30
22 Data Science
together many different approaches to knowledge edge discovery. So, KDD promotes the
interaction between different fields for identifying underlying knowledge in data [­
1–3].
Maths and statistics for data science are essential because these disciplines build the base
for all the machine learning algorithms. In fact, mathematics exists around us, starting
from shapes, patterns, and colors, to the count of colors in the diagram. Mathematics is
embedded in each part of our lives. Math, data science, and statistics are the building
blocks of machine learning algorithms. Now the question is, what statistics is and how
to use it? [4].
Statistics is a basic building block for mathematical science that pertains to data extrac-
tion, data collection, data analysis, data interpretation, and data presentation. Statistics is
used to solve the complex problems in the actual world around us so that data scientists
[­
5–7] and business people can get insights from this which can lead to meaningful trends
and patterns in data. Statistics can be used to extract underlying important information
from the data by applying mathematical computations to it.
Python is very popular for this kind of analysis and possesses a large number of stan-
dard libraries which can help in analyzing and representing the data. Data scientists need
to work on big amounts of data. With little knowledge about Python, anybody can build
better analytical tools which will help in developing machine learning, big data, cluster-
ing, artificial intelligence, etc.
Several statistical methods and techniques are executed to analyze raw data, model sta-
tistical analysis [8,9], and reach the result. The field of statistics has touched every part of
life, such as weather forecasting, life sciences, ­
e-commerce, retail, insurance, and educa-
tion [10,11].
2.2 
Data Science and Statistics Terminologies
We must be aware of some of the important statistical terminologies while dealing with
statistics and data science. Following are the terminologies in statistics:
1. Population is something from where data is collected.
2. A ­
sample – a subset of the population.
3. A ­
variable – quantity or number that is countable.
		 or you can say data item.
4. A statistical parameter is a number that guides probability distributions, such as
mean, median, mode, correlation, and covariance of the population.
2.3 
Types of Statistics
Statistical data analysis is broadly categorized into two main groups: descriptive statistics
and inferential statistics. The first category talks about the description of data including
23
Role of Statistical Methods
central tendency, the variation in data and the dispersion in data. While, the inferential
statistics talks about the inferences and the conclusions extracted from the data.
2.3.1 Descriptive
Descriptive statistics is the use of the data to describe data among the population through
numerous computations, reports, or graphs [12]. Descriptive statistics organizes the data
and highlights the qualities of data that provides different attributes to be considered.
2.3.2 Inferential
It generates predictions and inferences about a given population of the data based on sam-
ple data taken from the population [13]. It infers from a large dataset and applies probabil-
ity theory to extract results based on statistics and generates the model based on it.
2.4 
How to Describe a Single Set of Data
It is easy to produce a single set of data. Here the problem is how to describe this data. One
of the obvious ways is to describe any dataset is simply the data itself:
test = [65, 12, 78, 43, 32
#... and so on
]
Small datasets are easy to describe. But for large datasets, this is clumsy and probably
smeared. So statistics is used to refine and communicate characteristics of the data. The
following is the sample code to generate random values:
my_counter = Counter()
test = [100, 49, 41, 40, 25]
for _ in range(­
1000):
value = randint(­
0, 100)
test.append(­value)
counts = Counter(­test)
print(­counts)
x1 = range(­
101) # this the largest value it can take
y1= [counts[x] for x in x1]
plotting bar chart for (­
x1, y1)
plotting for [0, 80, 0, 24]
plotting the.title that is Histogram of number Counts
plotting label x as “­
test”
plotting label y as “ data”
at the end, it generates the histogram. This will generate a histogram as shown in ­Figure 2.1.
Probably, the simple statistics will be the number of data points:
num_points = len(­test)
24 Data Science
The largest and smallest values can be computed by:
largest_value = max(­test)
100
smallest_value = min(­test)
1
2.5 Statistical Analysis
In order to perform statistical analysis, we use standard methods that analyze the qual-
ity of data as well as the quantity of data. There are set of tools in statistics to analyze the
quantity of data. On the similar note, there are the tools to measure the quality of data.
2.5.1 Quantitative Analysis
It is the statistical analysis in which quantities are taken into consideration. Quantitative
measures such as numbers and graphs are used to find underlined patterns and trends [14].
2.5.2 Qualitative Analysis
It is a statistical analysis which uses qualitative measures such as color, intensity, and good
parameters. Let us consider the following example:
If you have apples in the basket, they are small, medium, and big. This tells us the qual-
ity of the apple and it falls under qualitative analysis. Whereas how many small, medium
and big apples are there this tells us quantitative analysis.
2.5.3 
Measures of the Central Tendency
Mean: It is a statistical way of calculating an average of all data point values in a dataset
under consideration. Generally, we use the mean for this. It is sum of the values of all data
­FIGURE 2.1
Histogram of number counts.
25
Role of Statistical Methods
points divided by the count of data points. If two data points are there, then the mean will
be half of the sum of their values. As you go on adding more data points, the mean gets
changed, depending on the value of each data point. The following is the function to cal-
culate the mean in Python:
def mean(­x1):
return sum(­
x1) / len(­
x1)
mean(­test)
ans. 50.842786069651744
Median: Median divides the data points into two halves. It is value of central data point
if odd number of data points are there and average of the two middle data points if the
number of data points are is even. Median function is a bit more complicated mostly in the
case of the “­
even” data points.
Use def function to define median(­
v):
number = len(­v)
sort = sorted(­v)
mid = number// 2
if number % 2 == 1:
return sort[mid]
else:
low = ­
mid -1
high = mid
return (­sort[low] + sort[high]) / 2
median(­test)
Mode: It is the value of most repeating data point in the dataset.
def mode(­x1):
count = Counter(­x1)
maximum_count = maximum(­count.values())
return [x for x count in counts.iteritems()
if count == maximum _count]
mode(­test)
ans. Result=array[21], count=array[18]
With the help of descriptive analysis, we can analyze each of the variables in the dataset.
2.5.4 
Measures of Dispersion
As we have measures of central tendency, the same way we have measures to calculate the
spread of data points as follows:
Range: It is the measure that represents data points distributed in a dataset are statistics
whose value is near zero denotes no spread at all on other hand big value signifies very
much spread of data points [15,16]. At this point range is an easy measure, it is the subtrac-
tion of the biggest and lowest data point values.
def data_range(­x1):
return max(­
x1) - min(­
x1)
data_range(­test).
26 Data Science
Interquartile Range: Uneven spread of data points is the interquartile range, it is based on
dividing a dataset into quartiles [17].
Variance: It gives us how much is the value data point differs from its value. It requires
following computations.
The deviation is the difference between values of each data element from the mean of
data points in the dataset.
Variance measures population and is the average calculated by taking squares of
Deviation.
n = len(­x1)
deviations = mean(­x1)
return sumofsquares(­
deviations) / (­­
n -1)
variance(­test)
ans. 853.0877215910497
Standard Deviation: It computes dispersion of a set of data values from the statistical
mean of data points. Whatever parameters our data have (­
e.g., “­
test”), the following is the
function to calculate standard deviation in Python:
def std_dev(­x1):
return math.sqrt(­variance(­x1))
std_dev(­test)
ans. 29.207665459448307
2.6 
Tools to Measure Relationships
Covariance and the correlation are the most common tools used to measure the relationship
between the data points. The relationship between the data points is measured as follows:
2.6.1 Covariance
Covariance is the measure of the variance between two random variables. If the variance
is positive, then variables move in the same direction; if it is negative, then they move in
opposite direction; and if it is zero, they have no relation with each other.
def covariance(­
x1, y1):
number = len(­x1)
return (­mean(­x1), mean(­y1)) / (­­n -1)
covariance(­test1, test2)
2.6.2 Correlation
Correlation is the measure of association of two random variables that range from minus
one to positive one; it is a normalized version of covariance. Generally, a correlation of ±0.7
shows a stronger association and ±0.3 shows no association between variables. Although
the purpose of both the covariance and correlation are used to analyze to generate outputs,
quantitative analysis provides a clear picture and hence it is critical in the analysis.
27
Role of Statistical Methods
2.7 
Probability Distribution Function
Probability density function is a probability function for continuous data where the values
of data at any point can be calculated as sample space providing a relative likelihood that
the value of the random variable would close to that sample.
def uniformpdf(­x1):
return 1 if (­
0= x1 1) else 0
def normalpdf(­x1):
x1 = [x / 11.0 for x (-100, 100)]
plt.plot(­x1,[normalpdf(­x, sigma=1) for x in x1], '-',
label='mu=0,sigma=1')
plt.plot(­x1,[normalpdf(­x, sigma=2) for x in x1], '--',
label='mu=0,sigma=2')
plt.plot(­x1,[normalpdf(­x, sigma=0.5) for x in x1], ':',
label='mu=0,sigma=0.5')
plt.plot(­x1,[normalpdf(­x, mu=-1) for x in x1], '-.',
label='mu=-1,sigma=1')
plt.title(Various Normal pdfs)
plt.show()
We can see the graph of the probability distribution function in ­
Figure 2.2.
2.7.1 
Cumulative Density Function
It is a probability function that tells us the probability of a random variable is less than a
particular value, which is the integral of the probability density function.
def uniformcdf(­x1):
if x1  0: return 0
elif x1  1: return x1
else: return 1
­FIGURE 2.2
Normal probability density function.
28 Data Science
2.7.2 
Continuous Data Distributions
It is also called uniform distribution, a probability distribution in which all outcomes are
equally distributed [1]. It comes out to be a bell shape and is associated with the central
limit theorem has a mean zero a delta that is standard deviation of 1. ­
Figure 2.3 represents
the graph of normal distribution.
The normal distribution with several values is plotted in ­
Figure 2.4. Probability is the
likelihood of an event to occur. Data science will not be complete without using some
probability and mathematics. The universal set of data consists of all likely answers and
a subset of these answers is an event. We’ll use probability theory to evaluate models [7].
2.7.3 Conditional Probability
[Prob(­
A|B)] is the probability chance of an event will occur, which depends on the occur-
rence of an earlier event.
­FIGURE 2.3
Normal distribution.
­FIGURE 2.4
Normal distribution of number values.
29
Role of Statistical Methods
def random_num_generator():
return random.choice([b, g])
random.seed(­100)
for j in range (­
1000):
y = random_num_generator()
o = random_num_generator()
if o == g:
og += 1
if (­o == g) and (­
y == g):
bg += 1
if o == g or y == g:
eg += 1
print(P(­
both | older):, bg / og)
print(P(­
both | either): , bg / eg)
ans. P(­
both | older): 0.5007089325501317
P(­
both | either): 0.3311897106109325
Independent events are the events whose outcomes are not dependent on the possibility of
the outcome of other event Prob(­
X|Y)=Prob(­
X). Mutually exclusive events are events that
cannot occur simultaneously; Prob(­
X|Y)=0.
2.7.4 Bayes’ Theorem
Bayes’ theorem is an important mathematical formula to determine conditional probabil-
ity [18]. The value of P(­
X|Y) is equal to P(­
Y |X) times the P(­
X) over the P(­
Y).
2.8 Hypothesis Testing
Hypothesis testing is statistics in which a statistician considers the assumption to analyze
some of the parameters of the dataset. There exists two standard methods, null hypothesis
and the alternative hypothesis:
1. Null Hypothesis: It is the hypothesis in which the data are retrieved truly based
on the chance.
2. Alternative Hypothesis: It is the hypothesis in which data under consideration
are impacted by some extraneous variables.
Statistical Significance
P-value: It is the statistical significance probability derived from the experimental outputs
of a test by considering that correct null hypothesis. If the ­
p-value is smaller, it denotes that
there is much possibility that an alternative hypothesis will be accepted.
Alpha: It is the significance level that denotes the possibility of declining the null hypoth-
esis though it is rightly called type 1 error that means the null hypothesis is rejected when
it is to be accepted.
30 Data Science
Beta: It is a type 2 error denoting the possibility of accepting the null hypothesis when is
to be rejected.
Following are the steps carried out for hypothesis:
1. Specify the null and alternative hypotheses
2. Compute sample space and size of the test
3. Calculate the test statistics and probability
4. Compare the observations to either accept or reject the null hypothesis.
2.9 Conclusion
The role of statistics in data science has always been given less importance as compared
to other fields like computer science. In the domains of data possession, analysis, and
advanced modeling are needed for prediction. Statisticians are acquainted to carry out
their tasks in this latest and highly appreciated domain of data science. By making use of
mathematical models with statistics, particularly in big data, data mining, machine learn-
ing will give rise to scientific outputs depending on suitable methods. Completely bal-
anced use of mathematics, informatics, and statistics will lead to good solutions to the
problems in data science.
In this chapter, we reviewed the basics of statistical analysis and the features of Python
which enriches the field of data science to extract the desired knowledge from the data
available. This is possible because of Python’s vast library which makes it easy and fast to
collect and analyze data.
References
1. Claus Weihs, Katja Ickstadt. Data science: the impact of statistics. International Journal of Data
Science and Analytics (­2018).
2. Weihs, C. Big data classification — aspects of many features. In: Michaelis, S., Piatkowski, N.,
Stolpe, M. (­
eds.) Solving Large Scale Learning Tasks: Challenges and Algorithms, Springer Lecture
Notes in Artificial Intelligence, vol. 9580, p­
p. 139–147 (­
2016).
3. Martin, R., Nagathil, A. Digital filters and spectral analysis, chap 4. In: Weihs, C., Jannach, D.,
Vatolkin, I., Rudolph, G. (­
eds.) Music Data Analysis—Foundations and Applications, p­p. 111–143.
CRC Press, Boca Raton (­
2017).
4. Bhattacharyya, G. K., Johnson, R. A. Statistical Concepts and Methods, John Wiley and Sons, New
York (­1997).
5. Joel Grus, Data Science from Scratch First Principles with Python. O’Reilly Media, Inc., Sebastopol,
CA (­2015).
6. Randy Paffenroth, Xiangnan Kong, Python in Data Science Research and Education. Proceedings
of the 14th Python in Science Conference 2015.
7. Ramchandra Sharad Mangrulkar, Antonis Michalas, Pallavi Chavan. Design of intelligent
applications using machine learning and deep learning techniques, 2021.
31
Role of Statistical Methods
8. Gregory Pietatsky. Python Eats Away at R: Top Software for Analytics, Data Science, Machine
Learning in 2018: Trends and Analysis, KDnuggets, 2018.
9. Gabriel Moreira: Python for Data Science, The Developers Conference 2015.
10. Bischl, B., Schiffner, J., Weihs, C. Benchmarking local classification methods. Computational
Statistics 28(­
6), 2599–2619 (­
2013) Press, G.: A Very Short History of Data Science. 2017.
11. Dyk, D.V., Fuentes, M., Jordan, M.I., Newton, M., Ray, B.K., Lang, D.T., Wickham, H. ASA
Statement on the Role of Statistics in Data Science. http://­
magazine.amstat.org/­
blog/­
2015/­
10/­
01/­­asastatement-­on-­the-­role-­of-­statistics-­in-­data-science/ (­2015).
12. Nilanjan Dey, Sanjeev Wagh, Parikshit N. Mahalle. Applied Machine Learning for Smart Data
Analysis, 2019.
13. https://­www.edureka.co/­blog/­­statistics-­and-probability.
14. Shapiro, S. S. and Wilk, M. B. An analysis of variance test for normality (­
complete samples).
Biometrika 52, ­
591–611 (­
1965).
15. Brown, M.S. Data Mining for Dummies. Wiley, London (­
2014).
16. Kallol Bosu Roy Choudhuri, Ramchandra S. Mangrulkar, “­
Data Acquisition and Preparation
for Artificial Intelligence and Machine Learning Applications”, in Design of Intelligent
Applications Using Machine Learning and Deep Learning Techniques, CRC Press. https://
­www.taylorfrancis.com/­chapters/­edit/­10.1201/­­9781003133681-1/­­data-­acquisition-­preparation-­
artificial-­intelligence-­machine-­learning-­applications-­kallol-­bosu-­roy-­choudhuri-­ramchandra-
mangrulkar.
17. Maruti Techlabs: Is Python the most popular language for data science? 2018.
18. Waller, R. A., Duncan, D. B. A Bayes rule for the symmetric multiple comparison problem.
Journal of the American Statistical Association 64(­1969).
Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan
33
DOI: 10.1201/9781003283249-3
3
­
Real-World Applications of Data Science
Baisa L. Gunjal
Amrutvahini College of Engineering
CONTENTS
3.1 Banking and Finance..........................................................................................................35
3.1.1 Customer Data Management...............................................................................35
3.1.2	­Real-Time Analytics..............................................................................................35
3.1.3 Algorithmic Trading.............................................................................................35
3.1.4 Providing Personalized Services.........................................................................35
3.1.5 Fraud Detection......................................................................................................36
3.2	­
E-commerce and Retail Industry......................................................................................36
3.2.1 Potential Customer Analysis................................................................................36
3.2.2 Customer Sentiment Analysis.............................................................................36
3.2.3 Optimizing Prices..................................................................................................37
3.2.4 Inventory Management........................................................................................37
3.2.5 Lifetime Value Prediction.....................................................................................37
3.3 Digital Marketing................................................................................................................38
3.3.1 Smarter Planning for Online Marketing............................................................38
3.3.2 Business Intelligence with Smarter ­
Decision-Making.....................................38
3.3.3 Managing Business Efficiently............................................................................39
3.3.4 Automating Recruitment Process.......................................................................39
3.4 Healthcare and Medical Diagnosis..................................................................................39
3.4.1 Managing and Monitoring Patient Health and Data.......................................39
3.4.2 Medical Image Analysis.......................................................................................40
3.4.3 Drug Research and Creation................................................................................40
3.4.4 Patient Diagnosis and Preventing Diseases......................................................40
3.4.5 Providing Medical Virtual Assistance...............................................................41
3.5 Manufacturing Industry....................................................................................................41
3.5.1 Automating Product Design and Development................................................41
3.5.2 Inventory Management and Demand Forecasting...........................................41
3.5.3 Monitoring of Manufacturing Units...................................................................42
3.5.4	­
Real-Time Data of Performance and Quality.....................................................42
3.6 Education System................................................................................................................43
3.6.1 Monitoring Students’ and Teachers’ Requirements.........................................43
3.6.2 Measuring Students’ and Teachers’ Performance.............................................43
3.6.3 Innovating the Curriculum..................................................................................43
3.6.4 Automating ­
Outcome-Based Teaching and Learning Process.......................43
34 Data Science
3.7 Entertainment Industry......................................................................................................43
3.7.1 Predictive Analytics in the Film Industry.........................................................43
3.7.2 Tracking Progress of Movies................................................................................44
3.7.3 Generate Movie Revenue......................................................................................44
3.7.4 Improve ­
Post-production of Movies...................................................................44
3.8 Logistic Delivery and Transportation Units...................................................................45
3.8.1 Reducing Shipping Costs through Delivery Path Optimization....................45
3.8.2 Monitoring Traffic and Weather Data from Sensors........................................45
3.9 Shipping Sensitive Goods with Higher Quality.............................................................45
3.9.1 Automation of Warehouses and the Supply Chain..........................................45
3.10 Digital Advertising Systems..............................................................................................45
3.10.1 Price Comparison Websites..................................................................................45
3.10.2 Website Recommendation....................................................................................46
3.11 Internet Search Engines.....................................................................................................46
3.11.1 Proper Filtering......................................................................................................46
3.11.2 Autocomplete.........................................................................................................47
3.11.3 Recommendation Engines....................................................................................48
3.12 Airline Routing Planning...................................................................................................48
3.12.1 Predicting Flight Delays.......................................................................................48
3.12.2 Decide Route of Flight In Case of Emergency...................................................48
3.12.3 Running Customer Loyalty Programs Effectively...........................................49
3.13 Image and Speech Recognition Systems..........................................................................49
3.13.1 Image Recognition Systems.................................................................................49
3.13.2 Speech Recognition Systems................................................................................50
3.14 Gaming/­Sports....................................................................................................................50
3.14.1 Use Previous Gaming Experience to the Next Level.......................................50
3.14.2 Improve Player Moves Up to Higher Level.......................................................51
3.15 Social Life and Social Networking...................................................................................51
3.15.1 Building and Maintaining Social Relationship.................................................51
3.15.2 Maintaining Friend Circles through Social Media...........................................51
3.15.3 Building Human Network for Social Causes....................................................51
3.16 Augmented Reality.............................................................................................................52
3.16.1 Operation Room Augmented with Remote Presence......................................52
3.16.2 Social Media with Augmented Reality..............................................................52
3.17	­
Self-Driving Cars and Robots............................................................................................53
3.17.1 Intelligent Systems for ­
Self-Driving Cars...........................................................53
3.17.2 Robotics and Automation.....................................................................................53
3.18 Email Filtering and Character Recognitions...................................................................53
3.18.1 Email Spam Filtering............................................................................................53
3.18.2 Optical Character Recognitions..........................................................................54
3.19 Genetics and Genomics Research.....................................................................................54
3.19.1 Analyzing Impact of the DNA on the Health...................................................54
3.19.2 Analyzing Reaction of Genes to Various Medications.....................................54
3.19.3 Analyzing Set of Chromosomes in Humans, Animals...................................55
References........................................................................................................................................55
35
Real-World Applications
3.1 
Banking and Finance
3.1.1 
Customer Data Management
The customer’s data is an invaluable asset for banking and ­
finance-related sectors in the
digital era. Banks utilize customer data to enhance their productivity/­
services. Banking
sectors use unstructured data from social media to publish their brand services and need
customers’ views about their services. Data science helps banks in their transaction pro-
cessing and segmenting of huge banking data as well as customer feedback obtained
through social media platforms.
3.1.2 ­Real-Time Analytics
Predictive analysis is an important part of the banking and financial sectors. Stock/­
share markets’ ­
decision-making process is based on ­
real-time analysis in the finance
industry. The companies make their ­
decision-making about their money investment
based on the current market scenario. They can analyze the state of the market and
possible risks, predict profit, and accordingly plan their relevant buying and selling
business strategies [1].
3.1.3 Algorithmic Trading
Algorithmic trading is a process that facilitates transaction ­
decision-making in financial
markets using advanced mathematical tools. It utilizes automated and preprogrammed
trading instructions to account for finance applications. Algorithmic trading is capable
of capturing ­
profit-making opportunities happening in the market much before other
human traders can even spot them. Algorithmic trading provides the following benefits:
i. Trades are executed for the best possible and profitable market prices.
ii. They provide automated checks on multiple market conditions.
iii. This system reduces transaction costs.
iv. It reduces the risk of manual errors while placing trades.
v. It reduces possible emotional and psychological factors that happen by mistakes.
3.1.4 
Providing Personalized Services
Banking sectors and other financial companies use data science and machine learning for
process automation, stronger levels of overall security, and social media interactions with
customers. The other personalized services may include [2]:
i. Daily/­frequent transactions.
ii. Customer’s past details.
iii. Debit/­credit details.
iv. Past/­
present loans details.
v. Bank details.
Another Random Document on
Scribd Without Any Related Topics
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project
Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except
for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set
forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.
The Foundation’s business office is located at 809 North 1500
West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws
regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states
where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot
make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.
Project Gutenberg™ eBooks are often created from several
printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
back
back
back
Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan
back
Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan
back
back
back
back
back
back
back
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Artificial Intelligence For Data Science In Theory And Practice Mohamed Allog...
PDF
Frontiers in Data Science 1st Edition Matthias Dehmer
PDF
Data Science And Big Data Analytics Proceedings Of Idba 2023 2024th Edition D...
PDF
Data Science 1st Edition Robert Stahlbock Gary M Weiss Mahmoud Abounasr
PDF
Frontiers In Data Science 1st Edition Matthias Dehmer Frank Emmertstreib
PDF
Data Science In Societal Applications Siddharth Swarup Rautaray
PDF
Data Science In Societal Applications Siddharth Swarup Rautaray
PDF
Data Science Unit 01 PPT - SPPU Sem 6.pdf
Artificial Intelligence For Data Science In Theory And Practice Mohamed Allog...
Frontiers in Data Science 1st Edition Matthias Dehmer
Data Science And Big Data Analytics Proceedings Of Idba 2023 2024th Edition D...
Data Science 1st Edition Robert Stahlbock Gary M Weiss Mahmoud Abounasr
Frontiers In Data Science 1st Edition Matthias Dehmer Frank Emmertstreib
Data Science In Societal Applications Siddharth Swarup Rautaray
Data Science In Societal Applications Siddharth Swarup Rautaray
Data Science Unit 01 PPT - SPPU Sem 6.pdf

Similar to Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan (20)

PDF
Data Science And Big Data An Environment Of Computational Intelligence 1st Ed...
PDF
Test-Driven Development_ A Paradigm Shift in Software Engineering (1).pdf
PDF
Practical Data Science_ Tools and Technique.pdf
PDF
Artificial Intelligence For Cognitive Modeling Theory And Practice Pijush Dutta
PPTX
Unit 1-FDS. .pptx
PDF
Cognitive Computing Systems Applications And Technological Advancements Visha...
PDF
Data Science - NXT Level_Dr.Arun.pdf
PDF
A Deep Dissertion Of Data Science Related Issues And Its Applications
PPTX
data science, prior knowledge ,modeling, scatter plot
PDF
Data science tutorial
PDF
Bridging the Gap: How Data Science Transforms Information into Intelligence
PPTX
An Introduction to Data Science.pptx learn
PPTX
Data Science ppt for the asjdbhsadbmsnc.pptx
PPTX
Introduction to Data science for Match teachers
PPTX
DILEEP DATA SCIERNCES PROJECT POWERPOINT PPT
PPTX
Machine learning in Data Science
PDF
Analytics Unleashed_ Navigating the World of Data Science.pdf
PDF
Data Science: Unlocking Insights and Transforming Industries
PDF
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
PDF
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
Data Science And Big Data An Environment Of Computational Intelligence 1st Ed...
Test-Driven Development_ A Paradigm Shift in Software Engineering (1).pdf
Practical Data Science_ Tools and Technique.pdf
Artificial Intelligence For Cognitive Modeling Theory And Practice Pijush Dutta
Unit 1-FDS. .pptx
Cognitive Computing Systems Applications And Technological Advancements Visha...
Data Science - NXT Level_Dr.Arun.pdf
A Deep Dissertion Of Data Science Related Issues And Its Applications
data science, prior knowledge ,modeling, scatter plot
Data science tutorial
Bridging the Gap: How Data Science Transforms Information into Intelligence
An Introduction to Data Science.pptx learn
Data Science ppt for the asjdbhsadbmsnc.pptx
Introduction to Data science for Match teachers
DILEEP DATA SCIERNCES PROJECT POWERPOINT PPT
Machine learning in Data Science
Analytics Unleashed_ Navigating the World of Data Science.pdf
Data Science: Unlocking Insights and Transforming Industries
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
Accelerating Discovery Mining Unstructured Information For Hypothesis Generat...
Ad

More from zolgrbesa (6)

PDF
Fundamentals On Oral And Maxillofacial Surgery Divya Mehrotra
PDF
India Migration Report 2021 Migrants And Health S Irudaya Rajan
PDF
Bioinformatics Tools And Big Data Analytics For Patient Care Rishabha Malviya
PDF
Proceedings Of Third International Conference On Computing Communications And...
PDF
Smart Trends In Computing And Communications Proceedings Of Smartcom 2022 Yud...
PDF
Contract Law In Focus 2017th Edition Michael B Kelly Lucille M Ponte
Fundamentals On Oral And Maxillofacial Surgery Divya Mehrotra
India Migration Report 2021 Migrants And Health S Irudaya Rajan
Bioinformatics Tools And Big Data Analytics For Patient Care Rishabha Malviya
Proceedings Of Third International Conference On Computing Communications And...
Smart Trends In Computing And Communications Proceedings Of Smartcom 2022 Yud...
Contract Law In Focus 2017th Edition Michael B Kelly Lucille M Ponte
Ad

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
01-Introduction-to-Information-Management.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Cell Types and Its function , kingdom of life
PPTX
master seminar digital applications in india
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Institutional Correction lecture only . . .
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Microbial diseases, their pathogenesis and prophylaxis
01-Introduction-to-Information-Management.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Cell Types and Its function , kingdom of life
master seminar digital applications in india
STATICS OF THE RIGID BODIES Hibbelers.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Complications of Minimal Access Surgery at WLH
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Institutional Correction lecture only . . .
Computing-Curriculum for Schools in Ghana
Renaissance Architecture: A Journey from Faith to Humanism
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Anesthesia in Laparoscopic Surgery in India
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf

Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan

  • 1. Data Science Techniques And Intelligent Applications 1st Edition Pallavi Vijay Chavan download https://ebookbell.com/product/data-science-techniques-and- intelligent-applications-1st-edition-pallavi-vijay- chavan-43892950 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Data Science Techniques And Intelligent Applications Pallavi Vijay Chavan https://ebookbell.com/product/data-science-techniques-and-intelligent- applications-pallavi-vijay-chavan-43877416 Convergence Of Big Data Technologies And Computational Intelligent Techniques Govind P Gupta https://ebookbell.com/product/convergence-of-big-data-technologies- and-computational-intelligent-techniques-govind-p-gupta-47573698 Intelligent Techniques For Warehousing And Mining Sensor Network Data Alfredo Cuzzocrea https://ebookbell.com/product/intelligent-techniques-for-warehousing- and-mining-sensor-network-data-alfredo-cuzzocrea-1339792 Modern Artificial Intelligence And Data Science Tools Techniques And Systems 1st Edition Abdellah Idrissi https://ebookbell.com/product/modern-artificial-intelligence-and-data- science-tools-techniques-and-systems-1st-edition-abdellah- idrissi-51716116
  • 3. Intelligence Science And Big Data Engineering Big Data And Machine Learning Techniques 5th International Conference Iscide 2015 Suzhou China June 1416 2015 Revised Selected Papers Part Ii 1st Edition Xiaofei He https://ebookbell.com/product/intelligence-science-and-big-data- engineering-big-data-and-machine-learning-techniques-5th- international-conference-iscide-2015-suzhou-china- june-1416-2015-revised-selected-papers-part-ii-1st-edition-xiaofei- he-5236382 Computational Intelligence For Missing Data Imputation Estimation And Management Knowledge Optimization Techniques Tshilidzi Marwala https://ebookbell.com/product/computational-intelligence-for-missing- data-imputation-estimation-and-management-knowledge-optimization- techniques-tshilidzi-marwala-1185414 Geospatial Data Science Techniques And Applications 1st Edition Hassan A Karimi https://ebookbell.com/product/geospatial-data-science-techniques-and- applications-1st-edition-hassan-a-karimi-6837948 Java For Data Science Examine The Techniques And Java Tools Supporting The Growing Field Of Data Science Reese https://ebookbell.com/product/java-for-data-science-examine-the- techniques-and-java-tools-supporting-the-growing-field-of-data- science-reese-20632914 Principles Of Data Science Learn The Techniques And Math You Need To Start Making Sense Of Your Data 1st Edition Sinan Ozdemir https://ebookbell.com/product/principles-of-data-science-learn-the- techniques-and-math-you-need-to-start-making-sense-of-your-data-1st- edition-sinan-ozdemir-5635464
  • 6. Data Science This book covers the topic of data science in a comprehensive manner and synthesizes both fundamental and advanced topics of a research area that has now reached its matu- rity. The book starts with the basic concepts of data science. It highlights the types of data and their use and importance, followed by a discussion on a wide range of applications of data science and widely used techniques in data science. Key Features • Provides an internationally respected collection of scientific research methods, technologies and applications in the area of data science. • Presents predictive outcomes by applying data science techniques to real-life applications. • Provides readers with the tools, techniques and cases required to excel with mod- ern artificial intelligence methods. • Gives the reader a variety of intelligent applications that can be designed using data science and its allied fields. The book is aimed primarily at advanced undergraduates and graduates studying machine learning and data science. Researchers and professionals will also find this book useful.
  • 8. Data Science Techniques and Intelligent Applications Edited by Pallavi Vijay Chavan Parikshit N Mahalle Ramchandra Mangrulkar Idongesit Williams
  • 9. First edition published 2023 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2023 selection and editorial matter, [Pallavi Vijay Chavan, Parikshit N Mahalle, Ramchandra Sharad Mangrulkar and Idongesit Williams]; individual chapters, the contributors Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data Names: Chavan, Pallavi, editor. | Mahalle, Parikshit N, editor. | Mangrulkar, Ramchandra, editor. | Williams, Idongesit, editor. Title: Data science : techniques and intelligent applications / edited by Pallavi Vijay Chavan, Parikshit N Mahalle, Ramchandra Mangrulkar, Idongesit Williams. Description: First edition. | Boca Raton, FL : Chapman & Hall/CRC Press, 2023. | Includes bibliographical references and index. | Summary: “The proposed book covers the topic of data science in a very comprehensive manner and synthesizes both fundamental and advanced topics of a research area that has now reached maturity. The book starts from the basic concepts of data science; it highlights the types of data, its use and its importance, followed by discussion on a wide range of applications of data science and widely used techniques in data science. Key features: provides an internationally respected collection of scientific research methods, technologies and appli- cations in the area of data science, presents predictive outcomes by applying data science techniques on real life appli- cations, provides readers with the tools, techniques and cases required to excel with modern artificial intelligence methods, and gives the reader variety of intelligent applications that can be designed using data science and its allied fields. The book is aimed primarily at advanced undergraduates and graduates studying machine learning and data science. Researchers and professionals will also find this book useful”— Provided by publisher. Identifiers: LCCN 2022002189 (print) | LCCN 2022002190 (ebook) | ISBN 9781032254494 (hbk) | ISBN 9781032254517 (pbk) | ISBN 9781003283249 (ebk) Subjects: LCSH: Big data. | Electronic data processing. | Information technology. | Information science. Classification: LCC QA76.9.B45 D394 2023 (print) | LCC QA76.9.B45 (ebook) | DDC 005.7—dc23/eng/20220416 LC record available at https://lccn.loc.gov/2022002189 LC ebook record available at https://lccn.loc.gov/2022002190 ISBN: 978-1-032-25449-4 (hbk) ISBN: 978-1-032-25451-7 (pbk) ISBN: 978-1-003-28324-9 (ebk) DOI: 10.1201/9781003283249 Typeset in Palatino by codeMantra ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­
  • 10. v Contents Preface............................................................................................................................................ vii Editors..............................................................................................................................................ix Contributors....................................................................................................................................xi 1. Instigation and Development of Data Science .................................................................1 Priyali Sakhare, Pallavi Vijay Chavan, Pournima Kulkarni, and Ashwini Sarode 2. Role of Statistical Methods in Data Science ...................................................................21 Deepa Mane and Sachin Shelke 3. Real-World Applications of Data Science........................................................................33 Baisa L. Gunjal 4. HDWR_SmartNet: A Smart Handwritten Devanagari Word Recognition System Using Deep ResNet-Based on Scan Profile Method .......................................57 Shalaka Prasad Deore 5. Safe Social Distance Monitoring and Face Mask Detection for Controlling COVID-19 Spread.................................................................................................................79 Nikhil Ingale, Onkar Rane, Piyush Kadam, and Vivek Kumar Singh 6. Real-Time Virtual Fitness Tracker and Exercise Posture Correction ........................91 Tejas Kachare, Manisha Sinha, Siddharth Kakade, Aryan Kakade, and Siddharth Nigade 7. Role of Data Science in Revolutionizing Healthcare.................................................. 103 Yashsingh Manral, Siddhesh Unhavane, and Jyoti Kundale 8. Application of Artificial Intelligence Techniques in the Early-Stage Detection of Chronic Kidney Disease............................................................................ 119 Anindita A. Khade and Amarsinh V. Vidhate 9. Multi-Optimal Deep Learning Technique for Detection and Classification of Breast Cancer....................................................................................................................... 131 Pravin Malve and Vijay S. Gulhane 10. Realizing Mother’s Features Influential on Childbirth Experience, towards Creation of a Dataset.......................................................................................................... 143 Himani Deshpande and Leena Ragha 11. BERT- and FastText-Based Research Paper Recommender System.......................... 169 Nemil Shah, Yash Goda, Naitik Rathod, Vatsal Khandor, Pankaj Kulkarni, and Ramchandra Mangrulkar ­
  • 11. vi Contents 12. Analysis and Prediction of Crime Rate against Women Using Classification and Regression Trees..........................................................................................................185 P. Tamilarasi and R. Uma Rani 13. Data Analysis for Technical Business Incubation Performance Improvement...... 195 Swati Shekapure and Nitin Shekapure 14. Satellite ­ Imagery-Based Wildfire Detection Using Deep Learning......................... 213 Anant Kaulage, Sagar Rane, and Sunil Dhore 15. ­ Low-Resource Language Document Summarization: A Challenge.........................221 Pranjali Deshpande and Sunita Jahirabadkar 16. Eclectic Analysis of Classifiers for Fake News Detection...........................................227 Vatsal Khandor, Yashvi Shah, Krutik Shah, Nitin Goje, and Ramchandra Mangrulkar 17. Data Science and Machine Learning Applications for Mental Health....................249 Dhruvi Khankhoje, Pruthav Jhaveri, and Narendra M. Shekokar 18. Analysis of Ancient and Modern Meditation Techniques on Human Mind and Body and Their Effectiveness in ­ COVID-19 Pandemic....................................... 267 Abhijeet Kushwah, Tirth Pandya, Prarthana Dhok, Prateek Koul, Pallavi Vijay Chavan, Shreeraj Vijayan, and Vishal Shrivastava Index..............................................................................................................................................299
  • 12. vii Preface We are in the ­ data-driven age, and the data science field has gained attention of people not only from the technical field but also from all the areas of society. There has been tremendous growth in data every day, and we are equipping ourselves with the best set of tools and technologies to handle those data. Data science has become a core capabil- ity everywhere, more specifically in the industrial sector. Many ­ e-commerce businesses are data driven. Every day, these ­ e-commerce businesses collect a huge amount of data, analyze the data and decide on some policy to improve the business. The fourth indus- trial revolution we are observing all over the globe is based on data science and its allied fields. Machine learning, deep learning and artificial intelligence are the key elements in data science. Data science with its scientific methods gives conclusions and analysis, while machine learning and deep learning algorithms help to have predictive models. There are several algorithms available with machine learning such as supervised learning, rein- forcement learning and unsupervised learning. Classification is the most popular category of machine learning algorithms. The researchers and practitioners use machine learning and deep learning algorithms as per their requirements and suitability of the algorithms to their problem statement. Deep learning techniques also play a crucial role in the design of intelligent applications. Deep learning algorithms are a family of artificial neural net- works. These networks learn themselves and get trained for a given set of data. Validation testing in deep learning exhibits accuracy in learning. Artificial intelligence also plays a significant role in the design of intelligent applications with data science. Artificial intel- ligence helps in designing ­ rule-based systems. Such ­ rule-based systems include facts and rules of inferences. On the other side, there are the technologies such as soft computing, fuzzy logic and genetic algorithm. These technologies have a significant contribution to the development of intelligent applications. Today, intelligent applications are used almost in all the sectors such as healthcare, education, agriculture, finance, games and sports, and entertainment. Intelligent applications in healthcare include diabetic retinopathy, diabetes detection, expert systems for medical diagnosis and many more. In the agriculture field, the popular intelligent applications includes the prediction of rain, weather forecasting, crop prediction, and soil quality determination and prediction. In finance, applications like stock price prediction and gold rate prediction are most popular. This book aims to provide a platform for researchers to promote intelligent applications and their design for the betterment of the technology and the society in turn. This volume comprises 18 chapters. The technical contents begin with the data science fundamentals; the book further includes the chapters based on tools, methods and the techniques used in data science. The chapters focus on the fundamentals of data science and the applications from domains such as agriculture, finance, healthcare, real estate and crime. ­ Chapter 1 states the basic building blocks of data science with different tools and techniques used in data analysis. ­ Chapter 2 discusses the role of statistical methods in data science. These methods are important in analytical processing. ­ Chapter 3 describes broad application areas of data science in the real world. Some of them are agriculture, healthcare, education, entertainment, etc. ­ Chapter 4 discusses a smart Handwritten Devanagari Word Recognition system using Deep ResNet based on the Scan Profile method. This chapter exhibits the analysis of different classifiers for ­ speech-based emotion recognition systems using linear and nonlinear features. ­ Chapter 5 presents an intelligent application to data
  • 13. viii Preface science stating safe social distance monitoring and face mask detection mechanism to con- trol the ­ COVID-19 spread. ­ Chapter 6 presents another intelligent application of data sci- ence that describes ­ real-time virtual fitness tracker and exercise posture correction. This mechanism implements the concept of machine learning. ­ Chapter 7 discusses the role of data science in revolutionizing the healthcare sector, yet another intelligent application of data science. ­ Chapter 8 presents another intelligent application of data science stating the ­ early-stage detection of chronic kidney disease. For development of this application, artifi- cial intelligence techniques are experimented. ­ Multi-optimal deep learning technique for detection and classification of breast cancer from histopathological images is presented in ­ Chapter 9. ­ Chapter 10 talks about influence of lifestyle on pregnancy complications. This is conducted in ­ real-world environment, and the results are presented. ­ Chapter 11 proposes a research paper recommendation system. ­ Chapter 12 shows an analysis and prediction of crime rate against women using classification and regression. ­ Chapter 13 talks about data analysis for technical business incubation performance improvement. ­ Chapter 14 shows satellite ­ imagery-based wildfire detection. This chapter demonstrates the usage of deep learning techniques with image data for detection. ­ Chapter 15 demonstrates low resource language document summarization. ­ Chapter 16 describes eclectic analysis of classifiers for fake news detection. ­ Chapter 17 talks about data science and machine learning applica- tions for mental health. Finally, ­ Chapter 18 presents the contribution by authors in artifi- cial intelligence stating analysis of ancient and modern meditation techniques on human mind and body and its effectiveness in the ­ COVID-19 pandemic. We would like to express our sincere thanks and gratitude to the authors for their out- standing contributions. Without their contributions, it would have become very difficult to submit this proposal. As the editors, we hope that this book will help and stimulate the researchers to pursue research in this domain. Also, we expect that the researchers will get benefited from the huge set of applications stated in this book. Special thanks go to our publisher, CRC Press/­ Taylor and Francis group. We hope that this book will present promising ideas and outstanding research contribu- tions and support further development of data science and intelligent application.
  • 14. ix Editors Dr. Pallavi Vijay Chavan is Associate Professor at Ramrao Adik Institute of Technology, D Y Patil Deemed to be University, Navi Mumbai, Maharashtra, India. She has been in academic for 16years, working in the area of data science and network security. In her academic journey, she published research work in the data science and security domain with reputable publishers including Springer, Elsevier, CRC Press and Inderscience. She has published 1 book, 7+ book chap- ters, 10+ international journal papers and 30+ international conference papers. Presently she is guiding five Ph.D. research scholars working in a similar domain. She com- pleted her Ph.D. from Rashtrasant Tukadoji Maharaj Nagpur University, Nagpur, Maharashtra, India, in 2017. She secured the first merit position at Nagpur University for the degree of B.E. in Computer Engineering in 2003. She is the recipient of research grants from UGC, CSIR and the University of Mumbai. She is a reviewer for Elsevier and Inderscience journals. Her firm belief is “­ Teaching is a mission”. Dr. Parikshit N Mahalle is a senior member of IEEE and a Professor and Head of the Department of Artificial Intelligence and Data Science at Vishwakarma Institute of Information Technology, Pune, India. He completed his Ph.D. from Aalborg University, Denmark, and continued as a postdoctoral researcher. He has 21+ years of teaching and research experience. He is a member of Board of Studies in Computer Engineering, Savitri Bai Phule University and var- ious universities. He has 9 patents and 200+ research publi- cations (­ citations: 1830, ­ H-index:18) and authored/­ edited 30+ books with Springer, CRC Press, Cambridge University Press, etc. He is ­ editor-­ in-chief for IGI ­ Global – International Journal of Rough Sets and Data Analysis, Associate Editor for IGI ­ Global – International Journal of Synthetic Emotions, ­Inter-­science – International Journal of Grid and Utility Computing, and member of Editorial Review Board for IGI ­ Global – International Journal of Ambient Computing and Intelligence. His research interests are Algorithms, Internet of Things, Identity Management and Security. He has delivered more than 200 lectures at national and international levels.
  • 15. x Editors Dr. Ramchandra Mangrulkar, a ­ post-graduate from the National Institute of Technology, Rourkela, received his Ph.D. in computer science and engineering from SGB Amravati University, Amravati in 2016. At present, he is working as an Associate Professor in the Department of Computer Engineering at SVKM’s Dwarkadas J. Sanghvi College of Engineering, Mumbai (­ autonomous college affili- ated to the University of Mumbai), Maharashtra, India. Dr. Ramchandra Mangrulkar has published 50 papers and 24 book chapters with Taylor and Francis, Springer and IGI Global in the field of interest. He has presented 54 papers in national and international conferences. He has also edited five books with CRC Press, Taylor and Francis Series. He has also worked as a reviewer in many international journals and also reviewed many book proposals submitted to publishers of an international reputation. He has also chaired many sessions in national and international conferences. He has also shown keen interest in conducting and organizing workshops on Artificial Intelligence Bot in Education, Network Simulator 2, Innovative tools for Research and LaTeX Overleaf. He has also received certification of appreciation from DIG Special Crime Branch Pune and Superintendent of Police and broadcasting media gives wide publicity for the project work guided by him on the topic “­ Face Recognition System”. He has also received a ­ grant-­ in-aid of rupees three lacs and fifty thousand under the Research Promotion Scheme of AICTE, New Delhi for the project “­ Secured Energy Efficient Routing Protocol for Delay Tolerant Hybrid Network”. He is also working as an internal thesis advisor at NMIMS’s MPSTE Mumbai and DY Patil’s RAIT, Navi Mumbai. He also worked as an external referee for Ph.D. thesis evaluation at SGB Amravati University and RTM Nagpur University. He is an active member of the Board of Studies in various universities and autonomous institutes in India. Dr. Idongesit Williams is an Assistant Professor at Aalborg University Copenhagen. He holds a bachelor’s degree in Physics, a master’s degree in Information and Communications TechnologiesandaPh.D.Hisresearchareasare­socio-economic, ­ socio-technical related to Information and Communications Technologies. His research areas include the following. The facilitation of telecom and ICT infrastructure using ­ public– private partnerships; the development and the sustenance of ­ community-based networks, ­ e-government implementation; science and technology studies; gender adoption of ICTs; organizational adoption of ICTs; and user experience with ICTs and organizational learning. He has authored more than 60 research publications, including journal papers, books, book chapters, conference papers and magazine articles. He is the ­ co-editor of the book The African Mobile Story. He has delivered presentations at conferences and also helped in organizing conferences such as the CMI annual conference and the CMI/­ GTUC conferences.
  • 16. xi Contributors Pallavi Vijay Chavan Department of Information Technology Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India Shalaka Prasad Deore Department of Computer Engineering M.E.S. College of Engineering, Pune, S.P. Pune University Pune, India Himani Deshpande Department of Computer Engineering Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India Pranjali Deshpande Department of Computer Engineering MKSSS’s Cummins College of Engineering for Women Pune, India Prarthana Dhok Department of Information Technology Ramrao Adik Institute of Technology Navi Mumbai, India Sunil Dhore Department of Computer Engineering Army Institute of Technology Pune, India Yash Goda Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Nitin Goje Webster University Tashkent, Uzbekistan Vijay S. Gulhane Department of Information Technology Sipna College of Engineering Technology Amravati, India Baisa L. Gunjal Department of Information Technology Amrutvahini College of Engineering Sangamner, India Nikhil Ingale Department of Information Technology Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India Sunita Jahirabadkar Department of Computer Engineering MKSSS’s Cummins College of Engineering for Women Pune, India Pruthav Jhaveri Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Tejas Kachare Department of Information Technology Vishwakarma Institute of Technology Pune, India Piyush Kadam Department of Information Technology Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India Aryan Kakade Department of Computer Science Sinhgad College of Engineering Pune, India
  • 17. xii Contributors Siddharth Kakade Department of Computer Science Vishwakarma Institute of Technology Pune, India Anant Kaulage Army Institute of Technology Pune, India Anindita A. Khade Department of Computer Engineering Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India and SIES Graduate School of Technology Navi Mumbai, India Vatsal Khandor Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Dhruvi Khankhoje Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Prateek Koul Department of Information Technology Ramrao Adik Institute of Technology Navi Mumbai, India Pankaj Kulkarni Application Architect Deloitte Service LLP Franklin, Tennessee Pournima Kulkarni Lead Quality Analyst Healthstream, Inc. Nashville, Tennessee Jyoti Kundale Department of Information Technology Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India Abhijeet Kushwah Department of Information Technology Ramrao Adik Institute of Technology Navi Mumbai, India Pravin Malve Department of Computer Engineering Government Polytechnic Wardha, India Deepa Mane IT Department Smt. Kashibai Navale College of Engineering Pune, India Ramchandra Mangrulkar Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Yashsingh Manral Department of Information Technology Ramrao Adik Institute of Technology Navi Mumbai, India Siddharth Nigade Department of Computer Science Vishwakarma Institute of Technology Pune, India Tirth Pandya Department of Information Technology Ramrao Adik Institute of Technology Navi Mumbai, India Leena Ragha Department of Computer Engineering Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India
  • 18. xiii Contributors Onkar Rane Department of Information Technology Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India Sagar Rane Army Institute of Technology Pune, India Naitik Rathod Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Priyali Sakhare Department of Information Technology Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India Ashwini Sarode Department of Applied Computer Science SRH Heidelberg University Heidelberg, Germany Krutik Shah Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Nemil Shah Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Yashvi Shah Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Nitin Shekapure Production Engineering Department All India Shri Shivaji Memorial Society College of Engineering Pune, India Swati Shekapure Computer Engineering Department Marathwada Mitra Mandal’s College of Engineering Pune, India Narendra M. Shekokar Computer Engineering Department Dwarkadas J. Sanghvi College of Engineering Mumbai, India Sachin Shelke IT Department Pune Institute of Computer Technology Pune, India Vishal Shrivastava CRG Solutions Singapore Pte Ltd. Bangalore, India Vivek Kumar Singh Department of CSE Bharat Institute of Engineering and Technology Hyderabad, India Manisha Sinha Department of Electronics and Communication Engineering University of Engineering and Management Jaipur Jaipur, India P. Tamilarasi Department of Computer Science Sri Sarada College for Women (­Autonomous) Salem, India
  • 19. xiv Contributors R. Uma Rani Sri Sarada College for Women (­Autonomous) Salem, India Siddhesh Unhavane University of Illinois at ­Urbana-Champaign School of Information Sciences Champaign, Illinois and Department of Information Technology Ramrao Adik Institute of Technology Navi Mumbai, India Amarsinh V. Vidhate Department of Computer Engineering Ramrao Adik Institute of Technology D Y Patil Deemed to be University Navi Mumbai, India Shreeraj Vijayan PublicisSapient Toronto, Canada
  • 20. 1 DOI: 10.1201/9781003283249-1 1 Instigation and Development of Data Science Priyali Sakhare and Pallavi Vijay Chavan Ramrao Adik Institute of Technology, D.Y. Patil Deemed to be University Nerul, Navi Mumbai Pournima Kulkarni Healthstream, Inc. Ashwini Sarode SRH Heidelberg University 1.1 Data Science Data is the basis or groundwork of data science; it is the matter on which all the analyses are based on processing. In the context of data science, data can be bifurcated into two types: traditional and big data. CONTENTS 1.1 Data Science.............................................................................................................................1 1.1.1 Existence of Data Science...........................................................................................3 1.1.2 Data Science Process...................................................................................................5 1.1.2.1 Setting the Research Goal...........................................................................6 1.1.2.2 Retrieving Data............................................................................................7 1.1.2.3 Data Preparation..........................................................................................8 1.1.2.4 Data Exploration......................................................................................... 10 1.1.2.5 Data Modeling............................................................................................ 11 1.1.2.6 Presentation and Automation..................................................................12 1.1.3 Life ­ Cycle – Data Science.........................................................................................12 1.2 Relation between Data Science and Machine Learning..................................................13 1.2.1 Where Do We See Machine Learning in Data Science?......................................13 1.2.2 Which Machine Algorithms are used in Data Science?...................................... 14 1.2.2.1 Linear Regression Algorithm................................................................... 14 1.2.2.2 Decision Tree..............................................................................................15 1.2.2.3 ­K-Means Clustering...................................................................................15 1.2.3 Application of Machine Learning in Data Science..............................................15 1.3 Tools for Data science...........................................................................................................17 1.3.1 R Programming........................................................................................................17 1.3.2 Python........................................................................................................................17 1.4 Benefits and Applications....................................................................................................18 1.5 Conclusion.............................................................................................................................18 References........................................................................................................................................18
  • 21. 2 Data Science Traditional Data: In the context of data science, traditional data is in a structured format and can be stored in databases. It is mainly maintained by all types of businesses start- ing from very small to big organizations. Traditional data is stored in relational database management systems (­­Figure 1.1). For traditional data to be processed, all data goes through the preprocessing stage. This operation is necessary where the raw data is converted into a format which is more under- standable. Few processes are undertaken for traditional data. They are as follows: i. Collection of raw data and storing it on a server ii. Class labeling iii. Data cleansing iv. Data balancing v. Data shuffling Big Data: In the context of data science, big data is larger than traditional data where it deals with complex datasets which is difficult to be managed in traditional data. Big data is usually distributed across a large network which varies by variety (­ number, text, audio, video), velocity (­ retrieved, computed), and volume (­­ tera-, ­ peta-, exabytes) (­­ Figure 1.2). In big data, as the complexity of the data is very large, the preprocessing stage becomes crucial. Some of the stages of big data are quite similar to the traditional data [1,2]. The process for big data is as follows: i. Collection of the data ii. Class labeling iii. Data cleansing iv. Data masking ­FIGURE 1.1 Traditional data processing.
  • 22. 3 Instigation and Development In data science, the term “­ science” means the knowledge which is gained through system- atic study. Basically, it is a systematic plan that builds and organizes the knowledge which is in a form that is easily testable in explanations and predictions [3]. 1.1.1  Existence of Data Science Data science is a combination of two or more fields that uses different kinds of math and statistics, scientific methods, specialized programming, artificial intelligence, data analy- sis, algorithms, and systems for the extraction of knowledge from the data. The data can be structured and unstructured. The data science field prepares data for analysis, expands data science problems, makes ­ data-driven solutions, analyzes data, and searches the ­ high-level decisions in a broad range of application domains. Data science also is related to big data, machine learning, data mining, etc. It is a study of the massive amount of data that includes extraction of meaningful insight from structured and unstructured data which is completed using different algorithms and scientific methods [4]. ­ Figure 1.3 shows the diagram of data science. Data science is the discovery of knowledge through the analysis of data which is the statistics extension that is capable of dealing with huge amounts of data. In data science, the past data is being analyzed for prediction of future analysis. Data science usually works with dynamic unstructured data. The skills required for data science are statistics, visualization, and machine learning. Data science has several current viewpoints. They are as follows: i. Data science is about studying scientific and business data. ii. Data science is an integration of computing technology, statistics, and artificial intelligence. iii. The purpose of data science is to solve scientific as well as business problems by the extraction of knowledge from data [5,6]. ­FIGURE 1.2 Big data processing.
  • 23. 4 Data Science Data science uses powerful hardware, programming systems, and desired algorithms to solve the issues that arise in data. Data can be analyzed in a simple form. Refer to ­ Figure 1.4 for a few simple steps of data science. In short, data science is all about: i. Analyzing and questioning the raw data. ii. Modeling the data using the desired algorithm. iii. Visualize the data to achieve correct outcomes. iv. ­ Decision-making and finding the final results [7]. ­FIGURE 1.3 Existence of data science. ­FIGURE 1.4 Simple steps of data science.
  • 24. 5 Instigation and Development 1.1.2  Data Science Process Data science is a multidisciplinary field, and it allows us to extract information from both structured and unstructured data. Data science helps us to raise the chances of success in data science projects at the lowest price by using the standard approach. It also makes it possible to take a project by making teams, with each team member focusing on a particu- lar center of interest areas [8]. The data science process passes through various phases. The data science process con- sists of six important steps (­­ Figure 1.5). This summarizes the process of data science and the following list is a short introduc- tion to the process, each step will be discussed in depth further: i. Setting the Research Goal: The first phase of the process is setting a research goal. The main aim of this is to make sure that all the people of the company understand the purpose of the project. ii. Retrieving Data: The second phase of data science is used for retrieving data where data is available for analysis purposes, and it also includes searching for suitable data and getting access to data from the owner of the data. The data will be in the raw form which needs to be polished and transformed before it becomes usable. iii. Data Preparation: After processing the data in the second phase, i.e., retrieving data, we get raw data, so it’s time to prepare it. In the third phase of data science, data preparation is undertaken for processing this raw data. This includes the transformation of raw data into data that would be directly usable. For achieving the usable form of data, we have to detect and correct various errors of data and combine data from different data sources. Then it transforms those data. Once the data preparation is successfully completed, one can progress to data visualization and modeling. iv. Data Exploration: The fourth phase of the data science process is data exploration where data is explored in a deep understandable form. This phase will enable us to start the modeling phase. v. Data Modeling: The fifth phase of the data science process is data modeling which is also known as model building. Here, we have to select a technique from differ- ent fields like machine learning, statistics, and operation search. Data modeling is the iterative process which involves a selection of different variables for a model, execution of the model, and diagnosis of the model. ­FIGURE 1.5 Process of data science.
  • 25. 6 Data Science vi. Presentation and Automation: The last phase of the data science process is pre- sentation and automation. This phase is processed, only if needed. The impor- tance of this step is more clearly visible or understood in projects on a tactical and strategic scale. Some projects need to be performed again so that automating the project will save time [9]. Following these six steps pays off, higher project success rate and also increased research results. This process makes sure that we have a ­ well-designed research plan, clear deliv- erables, and a good understanding of the business query before you even start looking at data. Now, we will look into the process in detail. 1.1.2.1  Setting the Research Goal Every project starts by understanding the purpose of why, what, and how the project is established. Refer to ­ Figure 1.6 for understanding the setting of the research goal. For defining research goals and creating a project charter, we need to understand what does the company expects from us? And why does management place such a value on research? And is it part of a bigger picture of a project originating from an opportunity someone detected? So, answering these what, why, and how is the goal of the first phase of the process. This phase aims to have a clear research goal, a good understanding of contents, and proper planning of action with a timetable. This can help to create the project charter. Creating a project charter refers to the short formal document with a clear objective state- ment of the project. 1.1.2.1.1  Spend Time Understanding the Goals of Research Understanding the research goals is an essential outcome which usually states the purpose of the project in a focused and clear manner. This would be slightly critical for the success of the project. Constantly querying until the doubts are resolved and until the expecta- tions of the project are cleared, it takes months’ time to resolve the research problems. 1.1.2.1.2  Creation of Project Charter In the project charter, clients (­ users) want to know about what they are paying for, so we need to efficiently understand the problem and need to try to get an agreement on the deliverables. The project charter is mandatory for creating any project. ­FIGURE 1.6 Setting the research goal.
  • 26. 7 Instigation and Development A project charter needs teamwork and the inputs should consist of the following points: i. Research goal should be clear. ii. Project mission should be clear. iii. What resources are going to be used? iv. How is the project going to be analyzed? v. Project should be deliverable and successful. vi. Project should be completed in the timeline. Clients can use all of this information so that they can make the estimation of cost for the project and the people who are specialized are required for that project to make it successful. 1.1.2.2 Retrieving Data After setting the research goals, we need to collect the data. In this phase, we have to ensure whether there is sufficient quality of data and there is proper access to the data for further use of it. The data can be in various forms, i.e., it can be from an excel spreadsheet to many different types of databases. ­ Figure 1.7 shows the retrieval of data. In this phase, the required data is retrieved for processing. Occasionally, we need to design the collection of data by ourselves, but mostly we won’t get involved in this phase. Many companies already have collections of the data that are stored for us, but in rare cases if data is not available then it can be bought from third parties. ­FIGURE 1.7 Retrieving data.
  • 27. 8 Data Science 1.1.2.2.1  Data Stored within the Company Firstly, we need to check whether there is access for the data to the company. Then, we have to check the quality of the data that is available in the company. Many companies have the habit of keeping the key data, so cleaning of data can be already done. Mainly, the data can be stored in data warehouses, data marts, databases, etc. Data warehouse is a system where it combines the data from different sources into a central repository to store the data and support data mining, machine learning, and business intelligence. Data mart is a subset of a data warehouse where it focuses on a specific area which only allows the authorized user to quickly access critical data without wasting time over finding through an entire data warehouse. A database is used to store the data. Sometimes finding the data within the company becomes more challenging. As the company grows, the data gets scattered in many different places. Due to changes in the position of people in the company, many of them leave the company, so the knowledge of data may disappear. So, we need to develop some skills to recover the lost data. Another difficult task is to get proper access to data. To get access to the data is ­ time-consuming, and company politics are also involved in it. 1.1.2.2.2  Don’t Be Afraid of Buying Outside Data If the data is not available in the company, one can look outside your organization for buy- ing it. The outside data can be of absolute quality, and it relies on the organization that creates and manages it. There are many ­ open-data providers, as shown in ­ Table 1.1. 1.1.2.2.3  Check the Quality of Data to Avoid Problems The retrieval of data is the first time where data is seen in the process of data science. Most errors are encountered while gathering information and are easily spotted. But it may take many hours to solve these errors. So, checking the quality of data is very important for avoiding problems. 1.1.2.3 Data Preparation In the data preparation phase, we will look at how the data quality is enhanced for further processing. The data preparation phase is divided into three ­ sub-phases: data cleansing, data inte- gration, and data transformation (­­ Figure 1.8). Similar actions have been taken during the data cleansing, transformation, and integra- tion phases (­­ Figure 1.9). This looks a bit abstract to understand, but it is easier to learn once we will see each in detail. Now, we will learn detailed information about these ­sub-phases. ­TABLE 1.1 ­Open-Data Providers Open Data Site Information Freebase.org An open database that retrieves its information from sites like Wikipedia and MusicBrainz. Aiddata.org Open data for international development. ­open-data.europa.eu The home of the European Commission’s open data.
  • 28. 9 Instigation and Development 1.1.2.3.1 Data Cleansing Data cleansing is a ­ sub-process of the data preparation phase in the process of data science which focuses on removing the errors from the data that becomes a consistent and true representation of the process from which it originated. Errors are generated in two types: interpretation error and inconsistency error. Interpretation error occurs when we take the value of data for granted, whereas inconsistency error occurs when there are many incon- sistencies between the data sources or against the company’s standard value. Different types of errors can be detected by an easy check method. ­ Table 1.2 represents the overview of common errors which are detected commonly. We need to solve the prob- lem as soon as possible in the acquisition chain or else we need to fix it in the program. Later on, it will cause issues during further execution. ­FIGURE 1.8 Data preparation. ­FIGURE 1.9 Common actions taken during data cleansing, data transformation, and data integration.
  • 29. 10 Data Science 1.1.2.3.2 Data Integration Data integration is the process of merging various data from different sources into a single source for efficient data management. The data comes from several places and are inte- grated into one source. Data may vary in structure, size, and type ranging from various excel files and databases to text documents. We can create views and can also set operators in it. There are different ways of combining data. Two majorly used combine operations are joining and tacking. The first is joining which allows us to combine the information from one table with the information that we find in another table. The second is appending effectively adding observations from one table to another table. 1.1.2.3.3 Data Transformation Some models require the data to be in different shapes. Now, as we have ­ well-cleaned and integrated the data, this is the next task that we have to perform. Data transformation helps us in transforming the data so that it can take a suitable form for modeling the data. For transforming the data, we need to reduce the number of variables and we can also turn the variables into dummy variables. Many times we have many variables that need to be reduced as they don’t add any new information. If there are many variables in the model, it will become difficult for handling, and some techniques don’t perform well when they are overloaded with many input variables. So, it is very important to reduce the num- ber of variables. Meanwhile, we can also turn the variables into dummy variables where we can assign dummy values as 1 or 0, i.e., true=1 and false=0. Turning variables into dummies splits a variable that has multiple classes into multiple variables. 1.1.2.4 Data Exploration Data exploration is the fourth phase of the data science process. It is all about building a deep understanding of the data. It helps us to understand the interaction of variables within each other. For achieving this, we need to use visual techniques and statistics. Exploratory data analysis is another name for data exploration. During data exploratory, we take a deep look into the data (­­ Figure 1.10). We usually understand the information when it is in the picture or image form. Therefore, we use graphical techniques for understanding the data and the interactions between vari- ables. The visualization techniques we can use are simple graphs or combined graphs represented by connections between them. ­TABLE 1.2 Overview of Common Errors Error Description Possible Solution Errors Pointing to False Values within Single Data Set Impossible values Manual overrules Missing values Remove values Redundant white space Usage of string functions Errors Pointing to Inconsistencies between Data Sets Different units of measurements Recalculation Different levels aggregation The same level of measurement by aggregation or extrapolation
  • 30. 11 Instigation and Development 1.1.2.5 Data Modeling Data modeling is the fifth phase in the data science process which uses domain knowl- edge about the data that is found performing previous phases. In this phase, we have to choose a technique from the fields of operations research, machine learning, and statistics. Building a model is an iterative process which includes selecting a variable for the model then executing it and finally diagnosing the model. As per ­ Figure 1.11, data modeling is further divided into the following ways: i. Model and variable selection ii. Model execution iii. Model diagnostic and model comparison 1.1.2.5.1  Model and Variable Selection Here, we need to choose the variable that we want to add to the model. There are many mod- eling techniques available, so choosing the right model is up to us. We have to consider the model performance and whether the project also meets all the requirements for the model. 1.1.2.5.2 Model Execution Once we have selected the model and suitable variables for it, we will have to execute it. So, model execution is done for implementation of the code. Different software platforms ­FIGURE 1.10 Data exploration. FIGURE 1.11 Data modeling.
  • 31. 12 Data Science can be used for execution of the code. We can use python, R programming, and various suitable programming languages for implementation of models. 1.1.2.5.3  Model Diagnostic and Model Comparison In this, we will be making multiple models from which we can then select the best one based on multiple criteria. Many various models make high assumptions for freedom of the inputs, and we have to verify that these assumptions meet the expectations. This is known as model diagnostics. 1.1.2.6  Presentation and Automation After successful analysis of data and building a ­ well-performing model, the project is ready to be presented in front of the external people. Presentation and automation are the last stages of the data science process where the soft skills are most useful and are extremely important for the presentation of the project (­­ Figure 1.12). Automating data analysis is used for automating the projects that are needed to be executed again and again [9]. 1.1.3  Life ­ Cycle – Data Science Life cycle of data science is also known as the data science pipeline. Life cycle of data sci- ence consists of five important stages. ­ Figure 1.13 shows the different stages of the life cycle of data science as follows: i. Capture: The structured and unstructured data which are raw are gathered from all the relevant sources to capture data from devices in real time. ii. Prepare and Maintain: This stage inculcates the raw data into the format which is consistent for machine learning and analytics. iii. Process: Data scientist examines the patterns and range of values within the data to determine whether the data is sustainable for use with machine learning or predictive analytics. iv. Analyze: Discovery is done in this stage where the data scientist does the actual performance of machine learning, regression, and deep learning algorithms from the ­ well-prepared data. FIGURE 1.12 Presentation and automation.
  • 32. 13 Instigation and Development v. Communicate: Finally, the projects are presented as charts, reports, and in the form of other data visualizations, which is easy for ­ decision-makers to understand [10]. 1.2  Relation between Data Science and Machine Learning Machine learning gives the ability to computers without being explicitly programmed. It is also referred to as a process in which a computer can work accurately, as it collects and learns from the data it is given. Data science and machine learning are related to each other but have different goals and functionalities. Machine learning is a critical part of data science where it effectively uses different statistical algorithms for analyzing the data from multiple resources [9,11]. 1.2.1  Where Do We See Machine Learning in Data Science? Although machine learning is mainly linked to the ­ data-modeling step of the data science process, it can be also used in most of the steps in the data science process (­­ Figure 1.14). Machine learning is used in the data science process. The ­ data-modeling phase cannot be started until we understand the qualitative raw data. But before that, the data prepara- tion phase have significant benefits through machine learning. If we take an example for cleansing the text strings, machine learning helps group the same strings so that it can become easy to correct spelling errors. FIGURE 1.13 Life cycle of data science.
  • 33. 14 Data Science Machine learning is also helpful in exploring the data. Algorithms can recognize the underlined patterns in the data and are represented through various visualization tools including pie charts, box plots, histograms, density plots, bar charts and line graphs [9]. 1.2.2  Which Machine Algorithms are used in Data Science? Data scientists should be mandatorily aware of machine learning and its algorithms as many machine learning algorithms are broadly used in data science. Following are the names of machine learning algorithms that are mostly used in data science: i. Linear regression algorithm ii. Decision tree iii. ­K-means clustering 1.2.2.1  Linear Regression Algorithm Linear regression algorithm is a popular algorithm that is used in machine learning which is based on supervised learning. This algorithm is mostly used for prediction and forecast- ing purposes. Linear regression algorithm works on simple regression. Regression is the method of modeling the target values which is based on independent variables. It helps in representing the linear equation, which has the connection between the input set and pre- dictive output. Since it determines the linear relationship between input/­ output variables, it is known as linear regression. As per ­ Figure 1.15, equation for linear regression is , = + Y mx c where Y is the dependent variable, x is the independent variable, m is the slope, and c is the intercept. FIGURE 1.14 Machine learning used in process of data science.
  • 34. 15 Instigation and Development 1.2.2.2 Decision Tree Decision tree is based on supervised learning. This machine learning algorithm is ­ well-known because it can be used in both classification and regression problems. In the decision tree algorithm, we can resolve the issue by using tree representation in which each node represents features, each branch shows a decision, and each leaf repre- sents the outcomes (­­ Figure 1.16). In the decision tree, we start searching from the root node of the tree and compare the values of root with the record attribute. We continue compar- ing the nodes and lastly, we come to the outcome that is to the leaf node. 1.2.2.3 ­K-Means Clustering ­ K-means clustering is the machine learning algorithm which belongs to the unsuper- vised learning algorithm. ­ K-means clustering solves the problems related to the cluster- ing. Clustering is referred to as a small set of data which are formed by breaking a large dataset. This small set of data is known as clusters. The main aim of this algorithm is to minimize the sum of distances between the clusters and data points [7,12]. ­ Figure 1.17 shows ­ k-means clustering. 1.2.3  Application of Machine Learning in Data Science Machine learning algorithms have wide range of applications with data science, some of the application domains are listed below [9]: FIGURE 1.15 Linear regression algorithm.
  • 35. 16 Data Science i. Automation and control ii. Education iii. Finance industry iv. Computer vision and Intelligence v. Customer segmentation FIGURE 1.16 Decision tree. FIGURE 1.17 K-means clustering.
  • 36. 17 Instigation and Development vi. Weather forecasting vii. Disease prediction mechanism viii. Price predictions in real estate 1.3  Tools for Data science Data scientists use many various types of tools for performing data science. Data scientists should build and run code for the creation of models. There are many ­ open-source tools available for performing these operations. Data science tools are of many different types, namely data analysis tools, data visualization tools, data warehousing, and machine learn- ing tools. Data science has various types of tools. i. Data analysis ­ tools – R, Python, MATLAB, Excel, Jupyter. ii. Data visualization ­ tools – Tableau, Cognos. iii. Data warehousing ­ tools – SQL, ETL, Hadoop, AWS Redshift. iv. Machine learning ­ tools – Spark, Mahout [7,11]. 1.3.1 R Programming R is the data analysis tool used for performing the data science operation. R is an ­ open- source programming language and has a suitable environment for developing graphics and computing. R is the most popularly used programming language among the data scientist community. R provides a variety of tools and libraries for creating visualizations, cleansing data, and evaluating machine learning and deep learning algorithms. R also offers multiple features for making statistical analysis of large data sets, which is easy for performing various tasks that are mentioned below: i. Linear and ­ non-linear modeling. ii. ­Time-series analysis. iii. Clustering. R Language has an integrated development environment which is available in RStudio. Packages available in R language CARET, random forest, e10171, Rpart, and many more. Data science projects that uses R programming are Twitter, ­T-Mobile, Google Analytics, etc. 1.3.2 Python Python is an ­ open-source, ­ object-oriented, ­ high-level, and interpreted programming lan- guage that emphasizes code readability. It is the data analysis tool which is used in data science. Python helps in developing strong data applications. The main reason why python is widely used in the scientific and research communities is that it’s easy to use and there is quite a simple syntax to adapt for people who do not have an idea or are not from an engineering background.
  • 37. 18 Data Science A large number of python libraries support data science tasks, including Pandas for data manipulation, NumPy for handling large dimensional arrays, Matplotlib for building data visualizations, and SciPy for providing functionalities for mathematics and computing [13]. 1.4  Benefits and Applications i. Data science helps drive innovation by enabling teams to share results, codes, and reports, and reduces redundancy. ii. It helps us to understand the project in an efficient and easy manner. iii. It makes data scientists more productive by helping them to accelerate and deliver models extra faster with less error. iv. It facilitates data scientists to work with large volumes and large varieties of data [11,14,15]. The following are the trending applications: i. Data science is used for image and speech recognition (­ e.g., Siri, Alexa, Cortana). ii. Data science is used in healthcare for detection of tumors, medical image analysis, and recognizing other diseases. iii. Data science is also used for detection of risk in finance industries. iv. Data science is used for designing ­ self-driving cars where automation is done within the car so that there would be a reduction in the number of accidents [7,16]. 1.5 Conclusion The aim of this chapter is to understand the deep theory of data science. The authors have explained the process of data science with respect to setting the goals, retrieving data, data preparation, data exploration, data modeling, presentation, and automation. Further, the data science life cycle is described, and this life cycle is also known as the data sci- ence pipeline. In the data science lifeline, there are six stages: capture, prepare, maintain, process, analyze, and communicate. The authors have also introduced the relationship between data science and machine learning. Machine learning is used in the process for solving data science problems in different ways by using certain algorithms. The authors have also given detailed information about tools that are used for performing data science operations. Finally, the author has explained large benefits of data science in various man- ners and applications of data science. References 1. https://­www.kdnuggets.com/­2018/­06/­­what-­where-­how-­data-science.html.
  • 38. 19 Instigation and Development 2. Haruki Nakamura. Big data science at AMED-BINDS. Japan Biophysical Review, 12(­ 2): 221–224. 2020. doi: 10.1007/­­s12551-­020-­00628-1. 3. https://­cacm.acm.org/­magazines/­2013/­12/­­169933-­data-­science-­and-prediction/­fulltext. 4. Dataflair Team. (­ 2019). What is data science? Complete data science tutorial for beginners. Retrieved 8.10.2019. 5. Alisa Bilal. Data Science: Fundamental Principles, Croatia, 2019. 6. Yangyong Zhu and Yun Xiong. Defining Data Science. Beyond the Study of the Rules of the Natural World as Reflected by Data, School of Computer Science, Fudan University, Shanghai, China Shanghai Key Laboratory of Data Science, Fudan University, China. 7. https://­ www.javatpoint.com/­ data-science. What is Data Science: Tutorial, Components, Life cycle. 8. Wikipedia. https://­en.wikipedia.org/­wiki/­Data_science. 9. Davy Cielen, Arno D. B. Meysman, Mohamed Ali. Introducing Data Science. Big Data, Machine Learning, and More, Using Python Tools. Manning Publications. 10. https://­www.ibm.com/­cloud/­learn/­data-science-introduction. IBM Cloud Learn Hub. 11. https://­www.oracle.com/­in/­data-science/­what-is-data-science/. Oracle India. 12. Mine Çetinkaya-Rundel, Victoria Ellison. A Fresh Look at Introductory Data Science. Journal of Statistics and Data Science Education, 29(­ sup1), 2021. 13. Analysis of US Population using Data Analytics and Data Science Tools. Computer Science and Information Technology, 9(­ 7), 2020. 14. Steven Perkins, Hazel Davis, Valerie du Preez. “­ Practical Data Science for Actuarial Task”. A practical example of data science considerations by Modelling, Analytics and Insights in Data working ­ party – New approaches to current actuarial work. 2020. 15. Harshil Jhaveri, Himanshu Ashar and Ramchandra Mangrulkar. Leveraging Data Science in Cyber Physical Systems to Overcome Covid-19. Journal of University of Shanghai for Science and Technology, 22(­ 10): 1993–2015, 2020. 16. Sanyukta Shrestha. A Deep Dissertation of Data Science: Related Issues and its Applications. Department of Information Technology, Amity University Uttar Pradesh, Noida, IEEE, 2019.
  • 40. 21 DOI: 10.1201/9781003283249-2 2 Role of Statistical Methods in Data Science Deepa Mane and Sachin Shelke SPPU 2.1 Introduction Data science as a ­ research-oriented discipline is majorly impacted by IR, computer sci- ence, mathematics, statistics, operations research, and the applied sciences too. Though data science was derived by statisticians, the significance of computer science and its role in business applications is often much more elaborated, particularly in the century of big data. John Tukey already changed the focus of statistics entirely from mathematics, statis- tical testing, to deriving hypotheses from data that try to understand the data before get- ting hypotheses. One more data mining technique is Knowledge Discovery in Databases (­ KDD) with its subtypes in data mining which is the knowledge data discovery combines CONTENTS 2.1 Introduction...........................................................................................................................21 2.2 Data Science and Statistics Terminologies........................................................................22 2.3 Types of Statistics..................................................................................................................22 2.3.1 Descriptive.................................................................................................................23 2.3.2 Inferential...................................................................................................................23 2.4 How to Describe a Single Set of Data................................................................................23 2.5 Statistical Analysis................................................................................................................24 2.5.1 Quantitative Analysis...............................................................................................24 2.5.2 Qualitative Analysis.................................................................................................24 2.5.3 Measures of the Central Tendency.........................................................................24 2.5.4 Measures of Dispersion...........................................................................................25 2.6 Tools to Measure Relationships..........................................................................................26 2.6.1 Covariance.................................................................................................................26 2.6.2 Correlation.................................................................................................................26 2.7 Probability Distribution Function......................................................................................27 2.7.1 Cumulative Density Function.................................................................................27 2.7.2 Continuous Data Distributions...............................................................................28 2.7.3 Conditional Probability...........................................................................................28 2.7.4 Bayes’ Theorem.........................................................................................................29 2.8 Hypothesis Testing...............................................................................................................29 2.9 Conclusion.............................................................................................................................30 References........................................................................................................................................30
  • 41. 22 Data Science together many different approaches to knowledge edge discovery. So, KDD promotes the interaction between different fields for identifying underlying knowledge in data [­ 1–3]. Maths and statistics for data science are essential because these disciplines build the base for all the machine learning algorithms. In fact, mathematics exists around us, starting from shapes, patterns, and colors, to the count of colors in the diagram. Mathematics is embedded in each part of our lives. Math, data science, and statistics are the building blocks of machine learning algorithms. Now the question is, what statistics is and how to use it? [4]. Statistics is a basic building block for mathematical science that pertains to data extrac- tion, data collection, data analysis, data interpretation, and data presentation. Statistics is used to solve the complex problems in the actual world around us so that data scientists [­ 5–7] and business people can get insights from this which can lead to meaningful trends and patterns in data. Statistics can be used to extract underlying important information from the data by applying mathematical computations to it. Python is very popular for this kind of analysis and possesses a large number of stan- dard libraries which can help in analyzing and representing the data. Data scientists need to work on big amounts of data. With little knowledge about Python, anybody can build better analytical tools which will help in developing machine learning, big data, cluster- ing, artificial intelligence, etc. Several statistical methods and techniques are executed to analyze raw data, model sta- tistical analysis [8,9], and reach the result. The field of statistics has touched every part of life, such as weather forecasting, life sciences, ­ e-commerce, retail, insurance, and educa- tion [10,11]. 2.2  Data Science and Statistics Terminologies We must be aware of some of the important statistical terminologies while dealing with statistics and data science. Following are the terminologies in statistics: 1. Population is something from where data is collected. 2. A ­ sample – a subset of the population. 3. A ­ variable – quantity or number that is countable. or you can say data item. 4. A statistical parameter is a number that guides probability distributions, such as mean, median, mode, correlation, and covariance of the population. 2.3  Types of Statistics Statistical data analysis is broadly categorized into two main groups: descriptive statistics and inferential statistics. The first category talks about the description of data including
  • 42. 23 Role of Statistical Methods central tendency, the variation in data and the dispersion in data. While, the inferential statistics talks about the inferences and the conclusions extracted from the data. 2.3.1 Descriptive Descriptive statistics is the use of the data to describe data among the population through numerous computations, reports, or graphs [12]. Descriptive statistics organizes the data and highlights the qualities of data that provides different attributes to be considered. 2.3.2 Inferential It generates predictions and inferences about a given population of the data based on sam- ple data taken from the population [13]. It infers from a large dataset and applies probabil- ity theory to extract results based on statistics and generates the model based on it. 2.4  How to Describe a Single Set of Data It is easy to produce a single set of data. Here the problem is how to describe this data. One of the obvious ways is to describe any dataset is simply the data itself: test = [65, 12, 78, 43, 32 #... and so on ] Small datasets are easy to describe. But for large datasets, this is clumsy and probably smeared. So statistics is used to refine and communicate characteristics of the data. The following is the sample code to generate random values: my_counter = Counter() test = [100, 49, 41, 40, 25] for _ in range(­ 1000): value = randint(­ 0, 100) test.append(­value) counts = Counter(­test) print(­counts) x1 = range(­ 101) # this the largest value it can take y1= [counts[x] for x in x1] plotting bar chart for (­ x1, y1) plotting for [0, 80, 0, 24] plotting the.title that is Histogram of number Counts plotting label x as “­ test” plotting label y as “ data” at the end, it generates the histogram. This will generate a histogram as shown in ­Figure 2.1. Probably, the simple statistics will be the number of data points: num_points = len(­test)
  • 43. 24 Data Science The largest and smallest values can be computed by: largest_value = max(­test) 100 smallest_value = min(­test) 1 2.5 Statistical Analysis In order to perform statistical analysis, we use standard methods that analyze the qual- ity of data as well as the quantity of data. There are set of tools in statistics to analyze the quantity of data. On the similar note, there are the tools to measure the quality of data. 2.5.1 Quantitative Analysis It is the statistical analysis in which quantities are taken into consideration. Quantitative measures such as numbers and graphs are used to find underlined patterns and trends [14]. 2.5.2 Qualitative Analysis It is a statistical analysis which uses qualitative measures such as color, intensity, and good parameters. Let us consider the following example: If you have apples in the basket, they are small, medium, and big. This tells us the qual- ity of the apple and it falls under qualitative analysis. Whereas how many small, medium and big apples are there this tells us quantitative analysis. 2.5.3  Measures of the Central Tendency Mean: It is a statistical way of calculating an average of all data point values in a dataset under consideration. Generally, we use the mean for this. It is sum of the values of all data ­FIGURE 2.1 Histogram of number counts.
  • 44. 25 Role of Statistical Methods points divided by the count of data points. If two data points are there, then the mean will be half of the sum of their values. As you go on adding more data points, the mean gets changed, depending on the value of each data point. The following is the function to cal- culate the mean in Python: def mean(­x1): return sum(­ x1) / len(­ x1) mean(­test) ans. 50.842786069651744 Median: Median divides the data points into two halves. It is value of central data point if odd number of data points are there and average of the two middle data points if the number of data points are is even. Median function is a bit more complicated mostly in the case of the “­ even” data points. Use def function to define median(­ v): number = len(­v) sort = sorted(­v) mid = number// 2 if number % 2 == 1: return sort[mid] else: low = ­ mid -1 high = mid return (­sort[low] + sort[high]) / 2 median(­test) Mode: It is the value of most repeating data point in the dataset. def mode(­x1): count = Counter(­x1) maximum_count = maximum(­count.values()) return [x for x count in counts.iteritems() if count == maximum _count] mode(­test) ans. Result=array[21], count=array[18] With the help of descriptive analysis, we can analyze each of the variables in the dataset. 2.5.4  Measures of Dispersion As we have measures of central tendency, the same way we have measures to calculate the spread of data points as follows: Range: It is the measure that represents data points distributed in a dataset are statistics whose value is near zero denotes no spread at all on other hand big value signifies very much spread of data points [15,16]. At this point range is an easy measure, it is the subtrac- tion of the biggest and lowest data point values. def data_range(­x1): return max(­ x1) - min(­ x1) data_range(­test).
  • 45. 26 Data Science Interquartile Range: Uneven spread of data points is the interquartile range, it is based on dividing a dataset into quartiles [17]. Variance: It gives us how much is the value data point differs from its value. It requires following computations. The deviation is the difference between values of each data element from the mean of data points in the dataset. Variance measures population and is the average calculated by taking squares of Deviation. n = len(­x1) deviations = mean(­x1) return sumofsquares(­ deviations) / (­­ n -1) variance(­test) ans. 853.0877215910497 Standard Deviation: It computes dispersion of a set of data values from the statistical mean of data points. Whatever parameters our data have (­ e.g., “­ test”), the following is the function to calculate standard deviation in Python: def std_dev(­x1): return math.sqrt(­variance(­x1)) std_dev(­test) ans. 29.207665459448307 2.6  Tools to Measure Relationships Covariance and the correlation are the most common tools used to measure the relationship between the data points. The relationship between the data points is measured as follows: 2.6.1 Covariance Covariance is the measure of the variance between two random variables. If the variance is positive, then variables move in the same direction; if it is negative, then they move in opposite direction; and if it is zero, they have no relation with each other. def covariance(­ x1, y1): number = len(­x1) return (­mean(­x1), mean(­y1)) / (­­n -1) covariance(­test1, test2) 2.6.2 Correlation Correlation is the measure of association of two random variables that range from minus one to positive one; it is a normalized version of covariance. Generally, a correlation of ±0.7 shows a stronger association and ±0.3 shows no association between variables. Although the purpose of both the covariance and correlation are used to analyze to generate outputs, quantitative analysis provides a clear picture and hence it is critical in the analysis.
  • 46. 27 Role of Statistical Methods 2.7  Probability Distribution Function Probability density function is a probability function for continuous data where the values of data at any point can be calculated as sample space providing a relative likelihood that the value of the random variable would close to that sample. def uniformpdf(­x1): return 1 if (­ 0= x1 1) else 0 def normalpdf(­x1): x1 = [x / 11.0 for x (-100, 100)] plt.plot(­x1,[normalpdf(­x, sigma=1) for x in x1], '-', label='mu=0,sigma=1') plt.plot(­x1,[normalpdf(­x, sigma=2) for x in x1], '--', label='mu=0,sigma=2') plt.plot(­x1,[normalpdf(­x, sigma=0.5) for x in x1], ':', label='mu=0,sigma=0.5') plt.plot(­x1,[normalpdf(­x, mu=-1) for x in x1], '-.', label='mu=-1,sigma=1') plt.title(Various Normal pdfs) plt.show() We can see the graph of the probability distribution function in ­ Figure 2.2. 2.7.1  Cumulative Density Function It is a probability function that tells us the probability of a random variable is less than a particular value, which is the integral of the probability density function. def uniformcdf(­x1): if x1 0: return 0 elif x1 1: return x1 else: return 1 ­FIGURE 2.2 Normal probability density function.
  • 47. 28 Data Science 2.7.2  Continuous Data Distributions It is also called uniform distribution, a probability distribution in which all outcomes are equally distributed [1]. It comes out to be a bell shape and is associated with the central limit theorem has a mean zero a delta that is standard deviation of 1. ­ Figure 2.3 represents the graph of normal distribution. The normal distribution with several values is plotted in ­ Figure 2.4. Probability is the likelihood of an event to occur. Data science will not be complete without using some probability and mathematics. The universal set of data consists of all likely answers and a subset of these answers is an event. We’ll use probability theory to evaluate models [7]. 2.7.3 Conditional Probability [Prob(­ A|B)] is the probability chance of an event will occur, which depends on the occur- rence of an earlier event. ­FIGURE 2.3 Normal distribution. ­FIGURE 2.4 Normal distribution of number values.
  • 48. 29 Role of Statistical Methods def random_num_generator(): return random.choice([b, g]) random.seed(­100) for j in range (­ 1000): y = random_num_generator() o = random_num_generator() if o == g: og += 1 if (­o == g) and (­ y == g): bg += 1 if o == g or y == g: eg += 1 print(P(­ both | older):, bg / og) print(P(­ both | either): , bg / eg) ans. P(­ both | older): 0.5007089325501317 P(­ both | either): 0.3311897106109325 Independent events are the events whose outcomes are not dependent on the possibility of the outcome of other event Prob(­ X|Y)=Prob(­ X). Mutually exclusive events are events that cannot occur simultaneously; Prob(­ X|Y)=0. 2.7.4 Bayes’ Theorem Bayes’ theorem is an important mathematical formula to determine conditional probabil- ity [18]. The value of P(­ X|Y) is equal to P(­ Y |X) times the P(­ X) over the P(­ Y). 2.8 Hypothesis Testing Hypothesis testing is statistics in which a statistician considers the assumption to analyze some of the parameters of the dataset. There exists two standard methods, null hypothesis and the alternative hypothesis: 1. Null Hypothesis: It is the hypothesis in which the data are retrieved truly based on the chance. 2. Alternative Hypothesis: It is the hypothesis in which data under consideration are impacted by some extraneous variables. Statistical Significance P-value: It is the statistical significance probability derived from the experimental outputs of a test by considering that correct null hypothesis. If the ­ p-value is smaller, it denotes that there is much possibility that an alternative hypothesis will be accepted. Alpha: It is the significance level that denotes the possibility of declining the null hypoth- esis though it is rightly called type 1 error that means the null hypothesis is rejected when it is to be accepted.
  • 49. 30 Data Science Beta: It is a type 2 error denoting the possibility of accepting the null hypothesis when is to be rejected. Following are the steps carried out for hypothesis: 1. Specify the null and alternative hypotheses 2. Compute sample space and size of the test 3. Calculate the test statistics and probability 4. Compare the observations to either accept or reject the null hypothesis. 2.9 Conclusion The role of statistics in data science has always been given less importance as compared to other fields like computer science. In the domains of data possession, analysis, and advanced modeling are needed for prediction. Statisticians are acquainted to carry out their tasks in this latest and highly appreciated domain of data science. By making use of mathematical models with statistics, particularly in big data, data mining, machine learn- ing will give rise to scientific outputs depending on suitable methods. Completely bal- anced use of mathematics, informatics, and statistics will lead to good solutions to the problems in data science. In this chapter, we reviewed the basics of statistical analysis and the features of Python which enriches the field of data science to extract the desired knowledge from the data available. This is possible because of Python’s vast library which makes it easy and fast to collect and analyze data. References 1. Claus Weihs, Katja Ickstadt. Data science: the impact of statistics. International Journal of Data Science and Analytics (­2018). 2. Weihs, C. Big data classification — aspects of many features. In: Michaelis, S., Piatkowski, N., Stolpe, M. (­ eds.) Solving Large Scale Learning Tasks: Challenges and Algorithms, Springer Lecture Notes in Artificial Intelligence, vol. 9580, p­ p. 139–147 (­ 2016). 3. Martin, R., Nagathil, A. Digital filters and spectral analysis, chap 4. In: Weihs, C., Jannach, D., Vatolkin, I., Rudolph, G. (­ eds.) Music Data Analysis—Foundations and Applications, p­p. 111–143. CRC Press, Boca Raton (­ 2017). 4. Bhattacharyya, G. K., Johnson, R. A. Statistical Concepts and Methods, John Wiley and Sons, New York (­1997). 5. Joel Grus, Data Science from Scratch First Principles with Python. O’Reilly Media, Inc., Sebastopol, CA (­2015). 6. Randy Paffenroth, Xiangnan Kong, Python in Data Science Research and Education. Proceedings of the 14th Python in Science Conference 2015. 7. Ramchandra Sharad Mangrulkar, Antonis Michalas, Pallavi Chavan. Design of intelligent applications using machine learning and deep learning techniques, 2021.
  • 50. 31 Role of Statistical Methods 8. Gregory Pietatsky. Python Eats Away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis, KDnuggets, 2018. 9. Gabriel Moreira: Python for Data Science, The Developers Conference 2015. 10. Bischl, B., Schiffner, J., Weihs, C. Benchmarking local classification methods. Computational Statistics 28(­ 6), 2599–2619 (­ 2013) Press, G.: A Very Short History of Data Science. 2017. 11. Dyk, D.V., Fuentes, M., Jordan, M.I., Newton, M., Ray, B.K., Lang, D.T., Wickham, H. ASA Statement on the Role of Statistics in Data Science. http://­ magazine.amstat.org/­ blog/­ 2015/­ 10/­ 01/­­asastatement-­on-­the-­role-­of-­statistics-­in-­data-science/ (­2015). 12. Nilanjan Dey, Sanjeev Wagh, Parikshit N. Mahalle. Applied Machine Learning for Smart Data Analysis, 2019. 13. https://­www.edureka.co/­blog/­­statistics-­and-probability. 14. Shapiro, S. S. and Wilk, M. B. An analysis of variance test for normality (­ complete samples). Biometrika 52, ­ 591–611 (­ 1965). 15. Brown, M.S. Data Mining for Dummies. Wiley, London (­ 2014). 16. Kallol Bosu Roy Choudhuri, Ramchandra S. Mangrulkar, “­ Data Acquisition and Preparation for Artificial Intelligence and Machine Learning Applications”, in Design of Intelligent Applications Using Machine Learning and Deep Learning Techniques, CRC Press. https:// ­www.taylorfrancis.com/­chapters/­edit/­10.1201/­­9781003133681-1/­­data-­acquisition-­preparation-­ artificial-­intelligence-­machine-­learning-­applications-­kallol-­bosu-­roy-­choudhuri-­ramchandra- mangrulkar. 17. Maruti Techlabs: Is Python the most popular language for data science? 2018. 18. Waller, R. A., Duncan, D. B. A Bayes rule for the symmetric multiple comparison problem. Journal of the American Statistical Association 64(­1969).
  • 52. 33 DOI: 10.1201/9781003283249-3 3 ­ Real-World Applications of Data Science Baisa L. Gunjal Amrutvahini College of Engineering CONTENTS 3.1 Banking and Finance..........................................................................................................35 3.1.1 Customer Data Management...............................................................................35 3.1.2 ­Real-Time Analytics..............................................................................................35 3.1.3 Algorithmic Trading.............................................................................................35 3.1.4 Providing Personalized Services.........................................................................35 3.1.5 Fraud Detection......................................................................................................36 3.2 ­ E-commerce and Retail Industry......................................................................................36 3.2.1 Potential Customer Analysis................................................................................36 3.2.2 Customer Sentiment Analysis.............................................................................36 3.2.3 Optimizing Prices..................................................................................................37 3.2.4 Inventory Management........................................................................................37 3.2.5 Lifetime Value Prediction.....................................................................................37 3.3 Digital Marketing................................................................................................................38 3.3.1 Smarter Planning for Online Marketing............................................................38 3.3.2 Business Intelligence with Smarter ­ Decision-Making.....................................38 3.3.3 Managing Business Efficiently............................................................................39 3.3.4 Automating Recruitment Process.......................................................................39 3.4 Healthcare and Medical Diagnosis..................................................................................39 3.4.1 Managing and Monitoring Patient Health and Data.......................................39 3.4.2 Medical Image Analysis.......................................................................................40 3.4.3 Drug Research and Creation................................................................................40 3.4.4 Patient Diagnosis and Preventing Diseases......................................................40 3.4.5 Providing Medical Virtual Assistance...............................................................41 3.5 Manufacturing Industry....................................................................................................41 3.5.1 Automating Product Design and Development................................................41 3.5.2 Inventory Management and Demand Forecasting...........................................41 3.5.3 Monitoring of Manufacturing Units...................................................................42 3.5.4 ­ Real-Time Data of Performance and Quality.....................................................42 3.6 Education System................................................................................................................43 3.6.1 Monitoring Students’ and Teachers’ Requirements.........................................43 3.6.2 Measuring Students’ and Teachers’ Performance.............................................43 3.6.3 Innovating the Curriculum..................................................................................43 3.6.4 Automating ­ Outcome-Based Teaching and Learning Process.......................43
  • 53. 34 Data Science 3.7 Entertainment Industry......................................................................................................43 3.7.1 Predictive Analytics in the Film Industry.........................................................43 3.7.2 Tracking Progress of Movies................................................................................44 3.7.3 Generate Movie Revenue......................................................................................44 3.7.4 Improve ­ Post-production of Movies...................................................................44 3.8 Logistic Delivery and Transportation Units...................................................................45 3.8.1 Reducing Shipping Costs through Delivery Path Optimization....................45 3.8.2 Monitoring Traffic and Weather Data from Sensors........................................45 3.9 Shipping Sensitive Goods with Higher Quality.............................................................45 3.9.1 Automation of Warehouses and the Supply Chain..........................................45 3.10 Digital Advertising Systems..............................................................................................45 3.10.1 Price Comparison Websites..................................................................................45 3.10.2 Website Recommendation....................................................................................46 3.11 Internet Search Engines.....................................................................................................46 3.11.1 Proper Filtering......................................................................................................46 3.11.2 Autocomplete.........................................................................................................47 3.11.3 Recommendation Engines....................................................................................48 3.12 Airline Routing Planning...................................................................................................48 3.12.1 Predicting Flight Delays.......................................................................................48 3.12.2 Decide Route of Flight In Case of Emergency...................................................48 3.12.3 Running Customer Loyalty Programs Effectively...........................................49 3.13 Image and Speech Recognition Systems..........................................................................49 3.13.1 Image Recognition Systems.................................................................................49 3.13.2 Speech Recognition Systems................................................................................50 3.14 Gaming/­Sports....................................................................................................................50 3.14.1 Use Previous Gaming Experience to the Next Level.......................................50 3.14.2 Improve Player Moves Up to Higher Level.......................................................51 3.15 Social Life and Social Networking...................................................................................51 3.15.1 Building and Maintaining Social Relationship.................................................51 3.15.2 Maintaining Friend Circles through Social Media...........................................51 3.15.3 Building Human Network for Social Causes....................................................51 3.16 Augmented Reality.............................................................................................................52 3.16.1 Operation Room Augmented with Remote Presence......................................52 3.16.2 Social Media with Augmented Reality..............................................................52 3.17 ­ Self-Driving Cars and Robots............................................................................................53 3.17.1 Intelligent Systems for ­ Self-Driving Cars...........................................................53 3.17.2 Robotics and Automation.....................................................................................53 3.18 Email Filtering and Character Recognitions...................................................................53 3.18.1 Email Spam Filtering............................................................................................53 3.18.2 Optical Character Recognitions..........................................................................54 3.19 Genetics and Genomics Research.....................................................................................54 3.19.1 Analyzing Impact of the DNA on the Health...................................................54 3.19.2 Analyzing Reaction of Genes to Various Medications.....................................54 3.19.3 Analyzing Set of Chromosomes in Humans, Animals...................................55 References........................................................................................................................................55
  • 54. 35 Real-World Applications 3.1  Banking and Finance 3.1.1  Customer Data Management The customer’s data is an invaluable asset for banking and ­ finance-related sectors in the digital era. Banks utilize customer data to enhance their productivity/­ services. Banking sectors use unstructured data from social media to publish their brand services and need customers’ views about their services. Data science helps banks in their transaction pro- cessing and segmenting of huge banking data as well as customer feedback obtained through social media platforms. 3.1.2 ­Real-Time Analytics Predictive analysis is an important part of the banking and financial sectors. Stock/­ share markets’ ­ decision-making process is based on ­ real-time analysis in the finance industry. The companies make their ­ decision-making about their money investment based on the current market scenario. They can analyze the state of the market and possible risks, predict profit, and accordingly plan their relevant buying and selling business strategies [1]. 3.1.3 Algorithmic Trading Algorithmic trading is a process that facilitates transaction ­ decision-making in financial markets using advanced mathematical tools. It utilizes automated and preprogrammed trading instructions to account for finance applications. Algorithmic trading is capable of capturing ­ profit-making opportunities happening in the market much before other human traders can even spot them. Algorithmic trading provides the following benefits: i. Trades are executed for the best possible and profitable market prices. ii. They provide automated checks on multiple market conditions. iii. This system reduces transaction costs. iv. It reduces the risk of manual errors while placing trades. v. It reduces possible emotional and psychological factors that happen by mistakes. 3.1.4  Providing Personalized Services Banking sectors and other financial companies use data science and machine learning for process automation, stronger levels of overall security, and social media interactions with customers. The other personalized services may include [2]: i. Daily/­frequent transactions. ii. Customer’s past details. iii. Debit/­credit details. iv. Past/­ present loans details. v. Bank details.
  • 55. Another Random Document on Scribd Without Any Related Topics
  • 56. payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information about donations to the Project Gutenberg Literary Archive Foundation.” • You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works. • You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work. • You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works. 1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below. 1.F. 1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright
  • 57. law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment. 1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE. 1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund.
  • 58. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem. 1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE. 1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions. 1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause. Section 2. Information about the Mission of Project Gutenberg™
  • 59. Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life. Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org. Section 3. Information about the Project Gutenberg Literary Archive Foundation The Project Gutenberg Literary Archive Foundation is a non- profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws. The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact
  • 60. Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS. The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate. While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate. International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff. Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and
  • 61. credit card donations. To donate, please visit: www.gutenberg.org/donate. Section 5. General Information About Project Gutenberg™ electronic works Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support. Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition. Most people start at our website which has the main PG search facility: www.gutenberg.org. This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.
  • 62. back
  • 63. back
  • 64. back
  • 66. back
  • 68. back
  • 69. back
  • 70. back
  • 71. back
  • 72. back
  • 73. back
  • 74. back
  • 75. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com