SlideShare a Scribd company logo
NASSCOM Future Skills Training
Course – Data Science & Analytics
Dhruv Saxena
Assistant Professor (TEQIP-NPIU)
1
2
3
4
5
6
7
Introduction
to
Data Science
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 8
Introduction to Data Science and Analytics
OBJECTIVES
The objective of this course is to Impart necessary knowledge of the
mathematical foundations needed for data science and develop
programming skills required to build data science applications.
Duration – 60 Hours (40L + 20C)
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 10
LEARNING OUTCOMES
At the end of this course, the students will be able to:
ā— Demonstrate understanding of the mathematical foundations
needed for data science.
ā— Collect, explore, clean, munge and manipulate data.
ā— Implement models such as k-nearest Neighbors, NaĆÆve Bayes,
linear and logistic regression, decision trees, neural networks and
clustering.
ā— Build data science applications using Python based toolkits.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 11
Data, Big Data and Challenges
Data Science
ā—¦ Introduction
ā—¦ Why Data Science
Data Scientists
ā—¦ What do they do?
Major/Concentration in Data Science
ā—¦ What courses to take.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 12
Data All Around
Lots of data is being collected and warehoused
ā—¦Web data, e-commerce
ā—¦Financial transactions, bank/credit transactions
ā—¦Online trading and purchasing
ā—¦Social Network
13
How Much Data Do We have?
Google processes 20 PB a day (2008)
Facebook has 60 TB of daily logs
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
1000 genomes project: 200 TB
Cost of 1 TB of disk: $35
Time to read 1 TB disk: 3 hrs
(100 MB/s)
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 14
Big Data
Big Data is any data that is expensive to manage and hard to extract value
from
ā—¦ Volume
ā—¦ The size of the data
ā—¦ Velocity
ā—¦ The latency of data processing relative to the growing demand for interactivity
ā—¦ Variety and Complexity
ā—¦ the diversity of sources, formats, quality, structures.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 15
Big Data
vs
Data Science
vs
Data Analytics
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 16
What is Data Science?
Dealing with unstructured and structured data, Data Science is a
field that comprises everything that related to data cleansing,
preparation, and analysis.
Data Science is the combination of statistics, mathematics,
programming, problem-solving, capturing data in ingenious ways,
the ability to look at things differently, and the activity of cleansing,
preparing, and aligning the data.
In simple terms, it is the umbrella of techniques used when trying
to extract insights and information from data.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 17
What is Big Data?
Big Data refers to humongous volumes of data that cannot be processed effectively with
the traditional applications that exist. The processing of Big Data begins with the raw data
that isn’t aggregated and is most often impossible to store in the memory of a single
computer.
A buzzword that is used to describe immense volumes of data, both unstructured and
structured, Big Data inundates a business on a day-to-day basis. Big Data is something that
can be used to analyze insights that can lead to better decisions and strategic business
moves.
The definition of Big Data, given by Gartner, is, ā€œBig data is high-volume, and high-velocity
or high-variety information assets that demand cost-effective, innovative forms of
information processing that enable enhanced insight, decision making, and process
automation.ā€
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 18
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 19
Big Data
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 20
What is Data Analytics?
Data Analytics the science of examining raw data to conclude that
information.
Data Analytics involves applying an algorithmic or mechanical process to
derive insights and, for example, running through several data sets to look for
meaningful correlations between each other.
It is used in several industries to allow organizations and companies to
make better decisions as well as verify and disprove existing theories or
models. The focus of Data Analytics lies in inference, which is the process of
deriving conclusions that are solely based on what the researcher already
knows.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 21
Types of Data We Have
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …
Streaming Data
You can afford to scan the data once
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 22
What To Do With These Data?
Aggregation and Statistics
ā—¦ Data warehousing and OLAP
Indexing, Searching, and Querying
ā—¦ Keyword based search
ā—¦ Pattern matching (XML/RDF)
Knowledge discovery
ā—¦ Data Mining
ā—¦ Statistical Modeling
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 23
Big Data and Data Science
ā€œā€¦ the sexy job in the next 10 years will be statisticians,ā€ Hal Varian, Google Chief
Economist
The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts
by 2018.
McKinsey Global Institute’s June 2011
India will be needing around 160,000+ Data Scientists by 2020 and World demand
predicted to be around 2.7million by 2020.
New Data Science institutes being created or repurposed – NYU, Columbia, Washington,
UCB,...
New degree programs, courses, boot-camps:
ā—¦ e.g., at Berkeley: Stats, I-School, CS, Astronomy…
ā—¦ One proposal (elsewhere) for an MS in ā€œBig Data Scienceā€
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 24
What is Data Science?
An area that manages, manipulates, extracts, and interprets knowledge from
tremendous amount of data.
Data science (DS) is a multidisciplinary field of study with goal to address the challenges
in big data.
Data science principles apply to all data – big and small.
Simply – Extraction of knowledge from large volumes of data that are structure or
unstructured.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 25
What is Data Science?
Theories and techniques from many fields and disciplines are used to
investigate and analyze a large amount of data to help decision makers in
many industries such as science, engineering, economics, politics, finance,
and education.
ā—¦ Computer Science
ā—¦ Pattern recognition, visualization, data warehousing, High performance computing,
Databases, AI
ā—¦ Mathematics
ā—¦ Mathematical Modeling
ā—¦ Statistics
ā—¦ Statistical and Stochastic modeling, Probability.
Mr. Dhruv Saxena, Asst. Professor (TEQIP-NPIU) 26
Why is it sexy?
Gartner’s 2014 Hype Cycle
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 27
Data Science
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 28
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 29
Real Life Examples
Companies learn your secrets, shopping patterns, and preferences
ā—¦ For example, can we know if a woman is pregnant, even if she doesn’t want us to know?
Target case study
Data Science and election (2008, 2012)
ā—¦ 1 million people installed the Obama Facebook app that gave access to info on ā€œfriendsā€
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 30
Applications of Data Science
Internet Search
Search engines make use of data science algorithms to deliver the best results for search queries
in a fraction of seconds.
Digital Advertisements
The entire digital marketing spectrum uses the data science algorithms - from display banners to
digital billboards. This is the mean reason for digital ads getting higher CTR than traditional
advertisements.
Recommender Systems
The recommender systems not only make it easy to find relevant products from billions of
products available but also adds a lot to user-experience. A lot of companies use this system to
promote their products and suggestions in accordance with the user’s demands and relevance of
information. The recommendations are based on the user’s previous search results.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 31
Big Data for Retail
Brick and Mortar or an online e-tailer, the answer to staying the
game and being competitive is understanding the customer better
to serve them. This requires the ability to analyze all the disparate
data sources that companies deal with every day, including the
weblogs, customer transaction data, social media, store-branded
credit card data, and loyalty program data.
32
Applications of Big Data
Big Data for Financial Services
Credit card companies, retail banks, private wealth management
advisories, insurance firms, venture funds, and institutional investment
banks use big data for their financial services. The common problem
among them all is the massive amounts of multi-structured data living
in multiple disparate systems, which can be solved by big data. Thus big
data is used in several ways like:
Customer analytics
Compliance analytics
Fraud analytics
Operational analytics
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 33
Big Data in Communications
Gaining new subscribers, retaining customers, and
expanding within current subscriber bases are top
priorities for telecommunication service providers. The
solutions to these challenges lie in the ability to combine
and analyze the masses of customer-generated data and
machine-generated data that is being created every day.
34
Applications of Data Analytics
Healthcare
The main challenge for hospitals with cost pressures tightens is to treat as many patients
as they can efficiently, keeping in mind the improvement of the quality of care. Instrument
and machine data are being used increasingly to track as well as optimize patient flow,
treatment, and equipment used in the hospitals. It is estimated that there will be a 1%
efficiency gain that could yield more than $63 billion in global healthcare savings.
Travel
Data analytics can optimize the buying experience through mobile/ weblog and social
media data analysis. Travel sights can gain insights into the customer’s desires and
preferences. Products can be up-sold by correlating the current sales to the subsequent
browsing increase browse-to-buy conversions via customized packages and offers.
Personalized travel recommendations can also be delivered by data analytics based on
social media data.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 35
Gaming
Data Analytics helps in collecting data to optimize and spend within as well as
across games. Game companies gain insight into the dislikes, the
relationships, and the likes of the users.
Energy Management
Most firms are using data analytics for energy management, including smart-
grid management, energy optimization, energy distribution, and building
automation in utility companies. The application here is centered on the
controlling and monitoring of network devices, dispatch crews, and manage
service outages. Utilities are given the ability to integrate millions of data
points in the network performance and lets the engineers use the analytics to
monitor the network.
36
Data Scientists
Data Scientist
ā—¦ The Sexiest Job of the 21st Century
ā€œThey find stories, extract
knowledge. They are not reporters ā€œ
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 37
Data Scientists
Data scientists are the key to realizing the opportunities presented by big data. They bring
structure to it, find compelling patterns in it, and advise executives on the implications for
products, processes, and decisions
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 38
What do Data Scientists do?
National Security
Cyber Security
Business Analytics
Engineering
Healthcare
And more ….
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 39
Concentration in Data Science
Mathematics and Applied Mathematics
Applied Statistics/Data Analysis
Solid Programming Skills (R, Python, Julia, SQL)
Data Mining
Data Base Storage and Management
Machine Learning and discovery
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 40
Machine Learning
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 41
What is Machine Learning ?
Machine learning (ML) is the study of computer algorithms
that improve automatically through experience.
It is seen as a subset of artificial intelligence.
Machine learning algorithms build a mathematical model
based on sample data, known as "training data", in order to
make predictions or decisions without being explicitly
programmed to do so.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 42
What is Machine Learning ?
Machine learning algorithms are used in a wide variety of
applications, such as email filtering and computer vision,
where it is difficult or infeasible to develop conventional
algorithms to perform the needed tasks.
Machine learning is closely related to computational
statistics, which focuses on making predictions using
computers.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 43
Real-time applications
Video
44
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 45
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 46
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 47
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 48
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 49
NASSCOM Formative Assessments (Mid-training)
 Formative assessment of students shall be conducted for 100 marks and the test duration shall be
between 45-60 min.
Post training assessment and certification shall be conducted after the successful completion of
training.
Only those students who are Registered and Attending training on Future Skills shall be eligible for
mid-training and post-training assessment.
All assessments shall be conducted online and Auto Proctored through NASSCOM SSC.
The assessment results shall be shared within 3 working days with the SPOC of the institute.
Formative Assessment scores are independent and shall not be counted in the final assessment
scores for certification.
Tentative Date – 16th August 2020
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 50
NASSCOM Formative Assessment
Syllabus for Data Sci. & Analytics
Module
No. of
Questions
Type of
Questions
Indicative
Time/Module
Marks
Introduction to
Data Science
2
MCQ & DC 2 min 6
Mathematical
Foundations
18
MCQ, DC &
ScB
20 min 44
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 51
Multiple Choice
Questions
MCQ
In this type of question, the candidate is asked to choose one or more
responses from a limited list of choices. It also includes True/ False
questions(T/F) depending on the level of difficulty.
Scenario based ScB
This question asks the candidate to describe how they might respond
to a hypothetical situation.
Direct Concept DC
This type of question revolves around the concept that particular subject
deals with. The candidate would be asked a direct question pertaining
to the concept of that particular subject. This can be an MCQ or Fill in
the Blank or Multiple Response
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 52
Next Lecture
ļ‚§Mathematical Foundations
ļ‚§Introduction & Syllabus
ļ‚§Linear Algebra – Vectors & Matrices
53Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU)
Mr. Dhruv Saxena
Asst. Professor (TEQIP-NPIU)54

More Related Content

PPTX
Introduction to Data Science
PPTX
Big data visualization
PDF
Data mining (lecture 1 & 2) conecpts and techniques
PDF
Big Data Analytics for Real Time Systems
PPTX
Data mining , Knowledge Discovery Process, Classification
PPT
Big data ppt
PPT
Introduction to Data Mining
PPT
Weka presentation
Introduction to Data Science
Big data visualization
Data mining (lecture 1 & 2) conecpts and techniques
Big Data Analytics for Real Time Systems
Data mining , Knowledge Discovery Process, Classification
Big data ppt
Introduction to Data Mining
Weka presentation

What's hot (20)

PPTX
Using Big Data to Drive Customer 360
PPTX
Software Development Methodologies
PPTX
Big data
PPTX
Introduction to Data mining
PDF
Software Engineering MCQs
PDF
Implementing Effective Data Governance
PPT
Data mining :Concepts and Techniques Chapter 2, data
PPTX
Introduction to Data Engineering
PDF
Introduction To Data Science
PPT
Data mining slides
Ā 
PPSX
Autonomous medical coding with discriminative transformers
PPTX
Big data ppt
PPTX
1. Data Analytics-introduction
PPTX
Big Data in Medicine
PPTX
DataPreprocessing.pptx
PPT
Data Extraction
PPTX
Introduction to data science.pptx
PPTX
Data analytics
PPTX
The 8 Best Examples Of Real-Time Data Analytics
PPTX
Data Quality & Data Governance
Using Big Data to Drive Customer 360
Software Development Methodologies
Big data
Introduction to Data mining
Software Engineering MCQs
Implementing Effective Data Governance
Data mining :Concepts and Techniques Chapter 2, data
Introduction to Data Engineering
Introduction To Data Science
Data mining slides
Ā 
Autonomous medical coding with discriminative transformers
Big data ppt
1. Data Analytics-introduction
Big Data in Medicine
DataPreprocessing.pptx
Data Extraction
Introduction to data science.pptx
Data analytics
The 8 Best Examples Of Real-Time Data Analytics
Data Quality & Data Governance
Ad

Similar to Introduction to Data Science and Analytics (20)

PPTX
Big data road map
PPTX
Data analytics
PPTX
Introduction to Data Science - Overview and application
PPTX
Chapter 1 Introduction to Data Science (Computing)
PDF
00-01 DSnDA.pdf
PPTX
Unit 1 (DSBDA) PD.pptx
PPTX
Big data
PDF
Data+Science : A First Course
PPTX
On Big Data
PPTX
Big data Analytics Fundamentals Chapter 1
PPTX
Big data analytics Module1 contents pptx
PPTX
BDA: Big Data Analytics for Unit-1 Vtu syllabus
PPTX
Big Data Analytics_Unit1.pptx
PDF
Big Data Scotland
PPTX
000 introduction to big data analytics 2021
PPTX
Introduction to Data Analytics
PPT
SENCER_panel.ppt
PPTX
BADS-MBA-Unit 1 that what data science and Interpretation
PPTX
basic of data science and big data......
PPTX
Introduction to Data Science 5-13.pptx
Big data road map
Data analytics
Introduction to Data Science - Overview and application
Chapter 1 Introduction to Data Science (Computing)
00-01 DSnDA.pdf
Unit 1 (DSBDA) PD.pptx
Big data
Data+Science : A First Course
On Big Data
Big data Analytics Fundamentals Chapter 1
Big data analytics Module1 contents pptx
BDA: Big Data Analytics for Unit-1 Vtu syllabus
Big Data Analytics_Unit1.pptx
Big Data Scotland
000 introduction to big data analytics 2021
Introduction to Data Analytics
SENCER_panel.ppt
BADS-MBA-Unit 1 that what data science and Interpretation
basic of data science and big data......
Introduction to Data Science 5-13.pptx
Ad

More from Dhruv Saxena (8)

PPTX
Disaster Management Course Objectives
PPTX
Disaster Management - Medical and Institutional arrangement
PPTX
Disaster Preparedness
PPTX
Disaster Management Introduction & Classification
PPTX
Hazards in Textile processing Industries
PPTX
Drought - Disaster management
PPTX
Cloudburst | Disaster Management
PPTX
Small bore system: Wastewater Engineering
Disaster Management Course Objectives
Disaster Management - Medical and Institutional arrangement
Disaster Preparedness
Disaster Management Introduction & Classification
Hazards in Textile processing Industries
Drought - Disaster management
Cloudburst | Disaster Management
Small bore system: Wastewater Engineering

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Business Analytics and business intelligence.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Supervised vs unsupervised machine learning algorithms
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Database Infoormation System (DBIS).pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Qualitative Qantitative and Mixed Methods.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IB Computer Science - Internal Assessment.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Miokarditis (Inflamasi pada Otot Jantung)
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Fluorescence-microscope_Botany_detailed content
Introduction-to-Cloud-ComputingFinal.pptx
climate analysis of Dhaka ,Banglades.pptx
Business Analytics and business intelligence.pdf

Introduction to Data Science and Analytics

  • 1. NASSCOM Future Skills Training Course – Data Science & Analytics Dhruv Saxena Assistant Professor (TEQIP-NPIU) 1
  • 2. 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. Introduction to Data Science Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 8
  • 10. OBJECTIVES The objective of this course is to Impart necessary knowledge of the mathematical foundations needed for data science and develop programming skills required to build data science applications. Duration – 60 Hours (40L + 20C) Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 10
  • 11. LEARNING OUTCOMES At the end of this course, the students will be able to: ā— Demonstrate understanding of the mathematical foundations needed for data science. ā— Collect, explore, clean, munge and manipulate data. ā— Implement models such as k-nearest Neighbors, NaĆÆve Bayes, linear and logistic regression, decision trees, neural networks and clustering. ā— Build data science applications using Python based toolkits. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 11
  • 12. Data, Big Data and Challenges Data Science ā—¦ Introduction ā—¦ Why Data Science Data Scientists ā—¦ What do they do? Major/Concentration in Data Science ā—¦ What courses to take. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 12
  • 13. Data All Around Lots of data is being collected and warehoused ā—¦Web data, e-commerce ā—¦Financial transactions, bank/credit transactions ā—¦Online trading and purchasing ā—¦Social Network 13
  • 14. How Much Data Do We have? Google processes 20 PB a day (2008) Facebook has 60 TB of daily logs eBay has 6.5 PB of user data + 50 TB/day (5/2009) 1000 genomes project: 200 TB Cost of 1 TB of disk: $35 Time to read 1 TB disk: 3 hrs (100 MB/s) Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 14
  • 15. Big Data Big Data is any data that is expensive to manage and hard to extract value from ā—¦ Volume ā—¦ The size of the data ā—¦ Velocity ā—¦ The latency of data processing relative to the growing demand for interactivity ā—¦ Variety and Complexity ā—¦ the diversity of sources, formats, quality, structures. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 15
  • 16. Big Data vs Data Science vs Data Analytics Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 16
  • 17. What is Data Science? Dealing with unstructured and structured data, Data Science is a field that comprises everything that related to data cleansing, preparation, and analysis. Data Science is the combination of statistics, mathematics, programming, problem-solving, capturing data in ingenious ways, the ability to look at things differently, and the activity of cleansing, preparing, and aligning the data. In simple terms, it is the umbrella of techniques used when trying to extract insights and information from data. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 17
  • 18. What is Big Data? Big Data refers to humongous volumes of data that cannot be processed effectively with the traditional applications that exist. The processing of Big Data begins with the raw data that isn’t aggregated and is most often impossible to store in the memory of a single computer. A buzzword that is used to describe immense volumes of data, both unstructured and structured, Big Data inundates a business on a day-to-day basis. Big Data is something that can be used to analyze insights that can lead to better decisions and strategic business moves. The definition of Big Data, given by Gartner, is, ā€œBig data is high-volume, and high-velocity or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.ā€ Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 18
  • 19. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 19
  • 20. Big Data Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 20
  • 21. What is Data Analytics? Data Analytics the science of examining raw data to conclude that information. Data Analytics involves applying an algorithmic or mechanical process to derive insights and, for example, running through several data sets to look for meaningful correlations between each other. It is used in several industries to allow organizations and companies to make better decisions as well as verify and disprove existing theories or models. The focus of Data Analytics lies in inference, which is the process of deriving conclusions that are solely based on what the researcher already knows. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 21
  • 22. Types of Data We Have Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can afford to scan the data once Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 22
  • 23. What To Do With These Data? Aggregation and Statistics ā—¦ Data warehousing and OLAP Indexing, Searching, and Querying ā—¦ Keyword based search ā—¦ Pattern matching (XML/RDF) Knowledge discovery ā—¦ Data Mining ā—¦ Statistical Modeling Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 23
  • 24. Big Data and Data Science ā€œā€¦ the sexy job in the next 10 years will be statisticians,ā€ Hal Varian, Google Chief Economist The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts by 2018. McKinsey Global Institute’s June 2011 India will be needing around 160,000+ Data Scientists by 2020 and World demand predicted to be around 2.7million by 2020. New Data Science institutes being created or repurposed – NYU, Columbia, Washington, UCB,... New degree programs, courses, boot-camps: ā—¦ e.g., at Berkeley: Stats, I-School, CS, Astronomy… ā—¦ One proposal (elsewhere) for an MS in ā€œBig Data Scienceā€ Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 24
  • 25. What is Data Science? An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data. Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data. Data science principles apply to all data – big and small. Simply – Extraction of knowledge from large volumes of data that are structure or unstructured. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 25
  • 26. What is Data Science? Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education. ā—¦ Computer Science ā—¦ Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI ā—¦ Mathematics ā—¦ Mathematical Modeling ā—¦ Statistics ā—¦ Statistical and Stochastic modeling, Probability. Mr. Dhruv Saxena, Asst. Professor (TEQIP-NPIU) 26
  • 27. Why is it sexy? Gartner’s 2014 Hype Cycle Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 27
  • 28. Data Science Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 28
  • 29. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 29
  • 30. Real Life Examples Companies learn your secrets, shopping patterns, and preferences ā—¦ For example, can we know if a woman is pregnant, even if she doesn’t want us to know? Target case study Data Science and election (2008, 2012) ā—¦ 1 million people installed the Obama Facebook app that gave access to info on ā€œfriendsā€ Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 30
  • 31. Applications of Data Science Internet Search Search engines make use of data science algorithms to deliver the best results for search queries in a fraction of seconds. Digital Advertisements The entire digital marketing spectrum uses the data science algorithms - from display banners to digital billboards. This is the mean reason for digital ads getting higher CTR than traditional advertisements. Recommender Systems The recommender systems not only make it easy to find relevant products from billions of products available but also adds a lot to user-experience. A lot of companies use this system to promote their products and suggestions in accordance with the user’s demands and relevance of information. The recommendations are based on the user’s previous search results. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 31
  • 32. Big Data for Retail Brick and Mortar or an online e-tailer, the answer to staying the game and being competitive is understanding the customer better to serve them. This requires the ability to analyze all the disparate data sources that companies deal with every day, including the weblogs, customer transaction data, social media, store-branded credit card data, and loyalty program data. 32
  • 33. Applications of Big Data Big Data for Financial Services Credit card companies, retail banks, private wealth management advisories, insurance firms, venture funds, and institutional investment banks use big data for their financial services. The common problem among them all is the massive amounts of multi-structured data living in multiple disparate systems, which can be solved by big data. Thus big data is used in several ways like: Customer analytics Compliance analytics Fraud analytics Operational analytics Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 33
  • 34. Big Data in Communications Gaining new subscribers, retaining customers, and expanding within current subscriber bases are top priorities for telecommunication service providers. The solutions to these challenges lie in the ability to combine and analyze the masses of customer-generated data and machine-generated data that is being created every day. 34
  • 35. Applications of Data Analytics Healthcare The main challenge for hospitals with cost pressures tightens is to treat as many patients as they can efficiently, keeping in mind the improvement of the quality of care. Instrument and machine data are being used increasingly to track as well as optimize patient flow, treatment, and equipment used in the hospitals. It is estimated that there will be a 1% efficiency gain that could yield more than $63 billion in global healthcare savings. Travel Data analytics can optimize the buying experience through mobile/ weblog and social media data analysis. Travel sights can gain insights into the customer’s desires and preferences. Products can be up-sold by correlating the current sales to the subsequent browsing increase browse-to-buy conversions via customized packages and offers. Personalized travel recommendations can also be delivered by data analytics based on social media data. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 35
  • 36. Gaming Data Analytics helps in collecting data to optimize and spend within as well as across games. Game companies gain insight into the dislikes, the relationships, and the likes of the users. Energy Management Most firms are using data analytics for energy management, including smart- grid management, energy optimization, energy distribution, and building automation in utility companies. The application here is centered on the controlling and monitoring of network devices, dispatch crews, and manage service outages. Utilities are given the ability to integrate millions of data points in the network performance and lets the engineers use the analytics to monitor the network. 36
  • 37. Data Scientists Data Scientist ā—¦ The Sexiest Job of the 21st Century ā€œThey find stories, extract knowledge. They are not reporters ā€œ Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 37
  • 38. Data Scientists Data scientists are the key to realizing the opportunities presented by big data. They bring structure to it, find compelling patterns in it, and advise executives on the implications for products, processes, and decisions Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 38
  • 39. What do Data Scientists do? National Security Cyber Security Business Analytics Engineering Healthcare And more …. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 39
  • 40. Concentration in Data Science Mathematics and Applied Mathematics Applied Statistics/Data Analysis Solid Programming Skills (R, Python, Julia, SQL) Data Mining Data Base Storage and Management Machine Learning and discovery Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 40
  • 41. Machine Learning Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 41
  • 42. What is Machine Learning ? Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 42
  • 43. What is Machine Learning ? Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 43
  • 45. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 45
  • 46. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 46
  • 47. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 47
  • 48. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 48
  • 49. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 49
  • 50. NASSCOM Formative Assessments (Mid-training)  Formative assessment of students shall be conducted for 100 marks and the test duration shall be between 45-60 min. Post training assessment and certification shall be conducted after the successful completion of training. Only those students who are Registered and Attending training on Future Skills shall be eligible for mid-training and post-training assessment. All assessments shall be conducted online and Auto Proctored through NASSCOM SSC. The assessment results shall be shared within 3 working days with the SPOC of the institute. Formative Assessment scores are independent and shall not be counted in the final assessment scores for certification. Tentative Date – 16th August 2020 Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 50
  • 51. NASSCOM Formative Assessment Syllabus for Data Sci. & Analytics Module No. of Questions Type of Questions Indicative Time/Module Marks Introduction to Data Science 2 MCQ & DC 2 min 6 Mathematical Foundations 18 MCQ, DC & ScB 20 min 44 Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 51
  • 52. Multiple Choice Questions MCQ In this type of question, the candidate is asked to choose one or more responses from a limited list of choices. It also includes True/ False questions(T/F) depending on the level of difficulty. Scenario based ScB This question asks the candidate to describe how they might respond to a hypothetical situation. Direct Concept DC This type of question revolves around the concept that particular subject deals with. The candidate would be asked a direct question pertaining to the concept of that particular subject. This can be an MCQ or Fill in the Blank or Multiple Response Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 52
  • 53. Next Lecture ļ‚§Mathematical Foundations ļ‚§Introduction & Syllabus ļ‚§Linear Algebra – Vectors & Matrices 53Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU)
  • 54. Mr. Dhruv Saxena Asst. Professor (TEQIP-NPIU)54