SlideShare a Scribd company logo
Introduction to Data
Science
DR. AHMAD KARAWASH
APRIL 2021
Outline
What is Data Science?
Where does data come from?
Why the excitement?
Machine Learning Models
The Data Science Process
9/10/2021 2
What is Data Science?
History & definition
9/10/2021 3
Data Science Definition
This is an Adaptation of the original Data Science Venn Diagram which is
licensed under Creative Commons Attribution-NonCommercial.
What is Data Science?
Data Science is an art of analyzing
and extracting knowledgeable
information from the data.
9/10/2021 4
Where does data come from?
9/10/2021 5
Big Sources of Data
Data-sources-of-Big-Data.ppm (640×360) (researchgate.net)
Where does data come from?
9/10/2021 6
Why the excitement?
9/10/2021 7
How Data Science Helped Southwest
Save $100 Million on Fuel?
How Big Data and the Industrial Internet Can Help Southwest Save $100 Million on Fuel | GE News
Why the excitement?
9/10/2021 8
How Data Science helped UPS save 39
millions of Gallon by route optimization?
big-data-infographic_v06 (ups.com)
Why the excitement?
9/10/2021 9
Amazon Recommendation System
9/10/2021 10
Amazon - recommendation engine | Youssef Rahoui | Flickr
Netflix Movie Suggestion
9/10/2021 11
https://www.slideshare.net/xamat/qcon-sf-2013-machine-learning-recommender-systems-
netflix-scale
US Polling: 2008 & 2012
Nate Silver uses a simple idea – taking a principled approach to aggregating polling instead of
relying on punditry – and:
• Predicts 49/50 states in 2008
• Predicts 50/50 states in 2012
https://hbr.org/2012/11/how-nate-silver-won-the-2012-p
Why the excitement?
9/10/2021 12
Machine Learning models
9/10/2021 13
Machine Learning models
Machine Learning Models
Supervised
learning
Predict an
outcome based
on historical
patterns
Ex: advertisements clicks
Unsupervised
learning
Analyze the
relationships
between data
elements
Ex: Facebook ‘friend suggestion’
Reinforcement
learning
Learns from
mistakes and get
better
Ex: Deep Blue
9/10/2021 14
The Data Science Process
9/10/2021 15
The Data Science Process
Formulate the question
Generate Hypothesis
Collect the data
Clean Data
Explore/Transform Data
Build Machine Learning models
Evaluate & Deploy the Model
The Data Science Process
9/10/2021 16
Formulate the question
◦ Understand the business problem
◦ Excellent results depend on a better understanding of the
problem.
The Data Science Process
9/10/2021 17
Generate Hypothesis
➢ Guess an approach through which we derive some essential data
parameters that have a significant correlation with the prediction
target.
➢ Let’s take an example of loan approval prediction, some features:
◦ Income: If an applicant has a higher income, should get a loan easily.
◦ Education: Higher education results in higher income, so we can approve
the loan request.
◦ Loan Amount: Lesser the amount, chances of loan approval are high.
9/10/2021 18
Collect the data
Kaggle UCI DSE NCBI
Kaggle: Your Machine Learning
and Data Science Community
UCI Machine Learning
Repository
Dataset Search (google.com) National Center for
Biotechnology Information
(nih.gov)
The Data Science Process
Gathering data from relevant sources!
9/10/2021 19
Clean Data
Some common source of data
errors:
◦ Duplicate entries
◦ Missing values
5 Data Science Projects That Will Get You Hired in 2018 - KDnuggets
The Data Science Process
9/10/2021 20
Explore/ Transform Data
Feature Identification
Univariate Analysis
Multi-variate Analysis
Feature Engineering
The Data Science Process
9/10/2021 21
Build Machine Learning models
The Data Science Process
Algorithm
Selection
Traing Model
Model
Prediction
9/10/2021 22
Evaluate & Deploy the Model
Validate selected
Machine learning
model
Deploy model and
make it accessible
through an API
The Data Science Process
9/10/2021 23
Questions?
Ahmad.karawash@gmail.com
9/10/2021 24

More Related Content

PDF
Untitled document.pdf
PDF
Guide for a Data Scientist
PPTX
A Practical-ish Introduction to Data Science
PPTX
An-Introduction-to-the-Data-Science.pptx
 
PPTX
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
PDF
A Beginner’s Guide to An Incredible Technology Data Science.pdf
 
PDF
a-beginner-guide-to-an-incredible-technology-data-science.pdf
 
PDF
Data science - An Introduction
Untitled document.pdf
Guide for a Data Scientist
A Practical-ish Introduction to Data Science
An-Introduction-to-the-Data-Science.pptx
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
A Beginner’s Guide to An Incredible Technology Data Science.pdf
 
a-beginner-guide-to-an-incredible-technology-data-science.pdf
 
Data science - An Introduction

Similar to introduction-to-data-science-210911034830 (1).pdf (20)

PDF
Data science presentation
PPTX
JavaZone 2018 - A Practical(ish) Introduction to Data Science
PDF
Defining Data Science: A Comprehensive Overview
 
PPTX
NDC Oslo : A Practical Introduction to Data Science
PPTX
Data Science course in Hyderabad .
PPTX
Data Science course in Hyderabad .
PDF
Data science course in ameerpet Hyderabad
PPTX
data science course training in Hyderabad
PPTX
data science course in Hyderabad data science course in Hyderabad
PPTX
data science.pptx
PPTX
best data science course institutes in Hyderabad
PPTX
C0-01 OEAD0002.pptx ,msbxkasbdkbakwdbkawdka
PPTX
introductiontodatascience-230122140841-b90a0856 (1).pptx
PPTX
Introduction to Data Science.pptx
PPTX
ds 2.pptx
PPTX
Data Science Training in Chandigarh h
PPTX
Introduction to data science
PPTX
Introduction to Data Science.pptx
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
PPTX
Careers in Data Science _ Navigating the Digital Frontier (1).pptx
Data science presentation
JavaZone 2018 - A Practical(ish) Introduction to Data Science
Defining Data Science: A Comprehensive Overview
 
NDC Oslo : A Practical Introduction to Data Science
Data Science course in Hyderabad .
Data Science course in Hyderabad .
Data science course in ameerpet Hyderabad
data science course training in Hyderabad
data science course in Hyderabad data science course in Hyderabad
data science.pptx
best data science course institutes in Hyderabad
C0-01 OEAD0002.pptx ,msbxkasbdkbakwdbkawdka
introductiontodatascience-230122140841-b90a0856 (1).pptx
Introduction to Data Science.pptx
ds 2.pptx
Data Science Training in Chandigarh h
Introduction to data science
Introduction to Data Science.pptx
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Careers in Data Science _ Navigating the Digital Frontier (1).pptx
Ad

More from smartashammari (20)

PPT
M251_Meeting 8 (SetsandMap Advanced Java).ppt
PPT
M251_Meeting 9 (Recursion_AdvancedJava).ppt
PPT
M251_Meeting 5 (Inheritance and Polymorphism).ppt
PPT
M251_Meeting 7 (Exception Handling and Text IO).ppt
PPT
M251_Meeting 8 (Sets and Maps_Java_).ppt
PPT
SLoSP-2007-1statisticalstatisticalstatistical.ppt
PPTX
1introduction-191021211508Algorithms and data structures.pptx
PPT
carrano_ppt04Algorithims and data structures.ppt
PPTX
lecture1-220221114413Algorithims and data structures.pptx
PPT
Data Structures: Introduction_______.ppt
PPT
DATA MININ _ TECHNOLOGY AND TECHNIQUE.ppt
PPTX
oopinpyhtonnew-140722060241-phpapp01.pptx
PPT
carrano_ppt04 Data Abstraction: The Walls .ppt
PPTX
M251_Meeting_ jAVAAAAAAAAAAAAAAAAAA.pptx
PPT
DSA___________________SSSSSSSSSSSSSS.ppt
PPTX
lecture1-2202211144eeeee24444444413.pptx
PPT
carrano_ppt04 DSAddddddddddddddddddd.ppt
PDF
datavisualizationinpythonv2-171103225436.pdf
PPTX
python-numwpyandpandas-170922144956.pptx
PPTX
NumPy_ SciPy_ _ DatiiiikaFrames (2).pptx
M251_Meeting 8 (SetsandMap Advanced Java).ppt
M251_Meeting 9 (Recursion_AdvancedJava).ppt
M251_Meeting 5 (Inheritance and Polymorphism).ppt
M251_Meeting 7 (Exception Handling and Text IO).ppt
M251_Meeting 8 (Sets and Maps_Java_).ppt
SLoSP-2007-1statisticalstatisticalstatistical.ppt
1introduction-191021211508Algorithms and data structures.pptx
carrano_ppt04Algorithims and data structures.ppt
lecture1-220221114413Algorithims and data structures.pptx
Data Structures: Introduction_______.ppt
DATA MININ _ TECHNOLOGY AND TECHNIQUE.ppt
oopinpyhtonnew-140722060241-phpapp01.pptx
carrano_ppt04 Data Abstraction: The Walls .ppt
M251_Meeting_ jAVAAAAAAAAAAAAAAAAAA.pptx
DSA___________________SSSSSSSSSSSSSS.ppt
lecture1-2202211144eeeee24444444413.pptx
carrano_ppt04 DSAddddddddddddddddddd.ppt
datavisualizationinpythonv2-171103225436.pdf
python-numwpyandpandas-170922144956.pptx
NumPy_ SciPy_ _ DatiiiikaFrames (2).pptx
Ad

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Quality review (1)_presentation of this 21
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
modul_python (1).pptx for professional and student
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Transcultural that can help you someday.
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Introduction to Data Science and Data Analysis
PDF
Introduction to the R Programming Language
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Quality review (1)_presentation of this 21
ISS -ESG Data flows What is ESG and HowHow
Optimise Shopper Experiences with a Strong Data Estate.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
annual-report-2024-2025 original latest.
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Analytics and business intelligence.pdf
modul_python (1).pptx for professional and student
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
oil_refinery_comprehensive_20250804084928 (1).pptx
Transcultural that can help you someday.
Data_Analytics_and_PowerBI_Presentation.pptx
[EN] Industrial Machine Downtime Prediction
Introduction to Data Science and Data Analysis
Introduction to the R Programming Language
STUDY DESIGN details- Lt Col Maksud (21).pptx
IB Computer Science - Internal Assessment.pptx

introduction-to-data-science-210911034830 (1).pdf

  • 1. Introduction to Data Science DR. AHMAD KARAWASH APRIL 2021
  • 2. Outline What is Data Science? Where does data come from? Why the excitement? Machine Learning Models The Data Science Process 9/10/2021 2
  • 3. What is Data Science? History & definition 9/10/2021 3
  • 4. Data Science Definition This is an Adaptation of the original Data Science Venn Diagram which is licensed under Creative Commons Attribution-NonCommercial. What is Data Science? Data Science is an art of analyzing and extracting knowledgeable information from the data. 9/10/2021 4
  • 5. Where does data come from? 9/10/2021 5
  • 6. Big Sources of Data Data-sources-of-Big-Data.ppm (640×360) (researchgate.net) Where does data come from? 9/10/2021 6
  • 8. How Data Science Helped Southwest Save $100 Million on Fuel? How Big Data and the Industrial Internet Can Help Southwest Save $100 Million on Fuel | GE News Why the excitement? 9/10/2021 8
  • 9. How Data Science helped UPS save 39 millions of Gallon by route optimization? big-data-infographic_v06 (ups.com) Why the excitement? 9/10/2021 9
  • 10. Amazon Recommendation System 9/10/2021 10 Amazon - recommendation engine | Youssef Rahoui | Flickr
  • 11. Netflix Movie Suggestion 9/10/2021 11 https://www.slideshare.net/xamat/qcon-sf-2013-machine-learning-recommender-systems- netflix-scale
  • 12. US Polling: 2008 & 2012 Nate Silver uses a simple idea – taking a principled approach to aggregating polling instead of relying on punditry – and: • Predicts 49/50 states in 2008 • Predicts 50/50 states in 2012 https://hbr.org/2012/11/how-nate-silver-won-the-2012-p Why the excitement? 9/10/2021 12
  • 14. Machine Learning models Machine Learning Models Supervised learning Predict an outcome based on historical patterns Ex: advertisements clicks Unsupervised learning Analyze the relationships between data elements Ex: Facebook ‘friend suggestion’ Reinforcement learning Learns from mistakes and get better Ex: Deep Blue 9/10/2021 14
  • 15. The Data Science Process 9/10/2021 15
  • 16. The Data Science Process Formulate the question Generate Hypothesis Collect the data Clean Data Explore/Transform Data Build Machine Learning models Evaluate & Deploy the Model The Data Science Process 9/10/2021 16
  • 17. Formulate the question ◦ Understand the business problem ◦ Excellent results depend on a better understanding of the problem. The Data Science Process 9/10/2021 17
  • 18. Generate Hypothesis ➢ Guess an approach through which we derive some essential data parameters that have a significant correlation with the prediction target. ➢ Let’s take an example of loan approval prediction, some features: ◦ Income: If an applicant has a higher income, should get a loan easily. ◦ Education: Higher education results in higher income, so we can approve the loan request. ◦ Loan Amount: Lesser the amount, chances of loan approval are high. 9/10/2021 18
  • 19. Collect the data Kaggle UCI DSE NCBI Kaggle: Your Machine Learning and Data Science Community UCI Machine Learning Repository Dataset Search (google.com) National Center for Biotechnology Information (nih.gov) The Data Science Process Gathering data from relevant sources! 9/10/2021 19
  • 20. Clean Data Some common source of data errors: ◦ Duplicate entries ◦ Missing values 5 Data Science Projects That Will Get You Hired in 2018 - KDnuggets The Data Science Process 9/10/2021 20
  • 21. Explore/ Transform Data Feature Identification Univariate Analysis Multi-variate Analysis Feature Engineering The Data Science Process 9/10/2021 21
  • 22. Build Machine Learning models The Data Science Process Algorithm Selection Traing Model Model Prediction 9/10/2021 22
  • 23. Evaluate & Deploy the Model Validate selected Machine learning model Deploy model and make it accessible through an API The Data Science Process 9/10/2021 23