SlideShare a Scribd company logo
6
Most read
7
Most read
9
Most read
Introduction to Data Science
Prepared by
S.L.Swarna AP/AI&DS
S.Santhiya AP/AI&DS
EXCEL ENGINEERING COLLEGE
Data All Around
• Data, Big Data and Challenges
• Data Science
– Introduction
– Why Data Science
• Data Scientists
– What do they do?
• Major/Concentration in Data Science
– What courses to take.
Data All Around
• Lots of data is being collected
and warehoused
– Web data, e-commerce
– Financial transactions, bank/credit transactions
– Online trading and purchasing
– Social Network
How Much Data Do We have?
• Google processes 20 PB a day (2008)
• Facebook has 60 TB of daily logs
• eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
• 1000 genomes project: 200 TB
Types of Data We Have
• Relational Data (Tables/Transaction/Legacy
Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can afford to scan the data once
What is Data Science?
• Data Science is about data gathering, analysis and
decision-making.
• Data Science is about finding patterns in data,
through analysis, and make future predictions.
• By using Data Science, companies are able to
make:
• Better decisions (should we choose A or B)
• Predictive analysis (what will happen next?)
• Pattern discoveries (find pattern, or maybe
hidden information in the data)
Where is Data Science Needed?
Examples of where Data Science is needed:
• For route planning: To discover the best routes to
ship
• To foresee delays for flight/ship/train etc.
(through predictive analysis)
• To create promotional offers
• To find the best suited time to deliver goods
• To forecast the next years revenue for a company
• To analyze health benefit of training
• To predict who will win elections
How Does a Data Scientist Work?
• A Data Scientist requires expertise in several
backgrounds:
• Machine Learning
• Statistics
• Programming (Python or R)
• Mathematics
• Databases
• A Data Scientist must find patterns within the
data. Before he/she can find the patterns, he/she
must organize the data in a standard format.
Here is how a Data Scientist works:
• Ask the right questions - To understand the business
problem.
• Explore and collect data - From database, web logs,
customer feedback, etc.
• Extract the data - Transform the data to a standardized
format.
• Clean the data - Remove erroneous values from the data.
• Find and replace missing values - Check for missing values
and replace them with a suitable value (e.g. an average
value).
• Normalize data - Scale the values in a practical
range (e.g. 140 cm is smaller than 1,8 m.
However, the number 140 is larger than 1,8. - so
scaling is important).
• Analyze data, find patterns and make future
predictions.
• Represent the result - Present the result with
useful insights in a way the "company" can
understand.
•

More Related Content

PPTX
Data science and business analytics
PPTX
DMDA Unit-1.pptx .
PDF
00-01 DSnDA.pdf
PPTX
DATA SCIENCE PPT BY TEACHERDADAPLUS.pptx
PPT
chap1.ppt
PPT
Information_System_and_Data_mining12.ppt
PPT
chap1.ppt
PPT
chap1.ppt
Data science and business analytics
DMDA Unit-1.pptx .
00-01 DSnDA.pdf
DATA SCIENCE PPT BY TEACHERDADAPLUS.pptx
chap1.ppt
Information_System_and_Data_mining12.ppt
chap1.ppt
chap1.ppt

Similar to Introduction to Data Science Presentation (20)

PPT
Data mining concept and methods for basic
PPTX
Introduction to Data Science - Overview and application
PPTX
Digital Economics
PDF
Understanding big data and data analytics big data
PPTX
introduction to data science
PDF
Module 2 Data Collection and Management.pdf
PPTX
data science process in data analytics.pptx
PDF
Lect 1 introduction
PPTX
DataScienceandVisualization_Mod_1_ppt.pptx
PPTX
Lect 1 introduction
PDF
Introduction to Business and Data Analysis Undergraduate.pdf
PPSX
Intro to Data Science Big Data
PDF
Business Analytics and Data mining.pdf
PPTX
Data Science Introduction to Data Science
PPTX
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
PPTX
big data and machine learning ppt.pptx
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PPTX
Introduction to Big Data Analytics
PPTX
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
Data mining concept and methods for basic
Introduction to Data Science - Overview and application
Digital Economics
Understanding big data and data analytics big data
introduction to data science
Module 2 Data Collection and Management.pdf
data science process in data analytics.pptx
Lect 1 introduction
DataScienceandVisualization_Mod_1_ppt.pptx
Lect 1 introduction
Introduction to Business and Data Analysis Undergraduate.pdf
Intro to Data Science Big Data
Business Analytics and Data mining.pdf
Data Science Introduction to Data Science
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
big data and machine learning ppt.pptx
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Introduction to Big Data Analytics
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
Ad

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
additive manufacturing of ss316l using mig welding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
Well-logging-methods_new................
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT
Project quality management in manufacturing
PPTX
OOP with Java - Java Introduction (Basics)
DOCX
573137875-Attendance-Management-System-original
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
CYBER-CRIMES AND SECURITY A guide to understanding
CH1 Production IntroductoryConcepts.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Lecture Notes Electrical Wiring System Components
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
additive manufacturing of ss316l using mig welding
Embodied AI: Ushering in the Next Era of Intelligent Systems
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Mechanical Engineering MATERIALS Selection
Well-logging-methods_new................
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Project quality management in manufacturing
OOP with Java - Java Introduction (Basics)
573137875-Attendance-Management-System-original
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Ad

Introduction to Data Science Presentation

  • 1. Introduction to Data Science Prepared by S.L.Swarna AP/AI&DS S.Santhiya AP/AI&DS EXCEL ENGINEERING COLLEGE
  • 2. Data All Around • Data, Big Data and Challenges • Data Science – Introduction – Why Data Science • Data Scientists – What do they do? • Major/Concentration in Data Science – What courses to take.
  • 3. Data All Around • Lots of data is being collected and warehoused – Web data, e-commerce – Financial transactions, bank/credit transactions – Online trading and purchasing – Social Network
  • 4. How Much Data Do We have? • Google processes 20 PB a day (2008) • Facebook has 60 TB of daily logs • eBay has 6.5 PB of user data + 50 TB/day (5/2009) • 1000 genomes project: 200 TB
  • 5. Types of Data We Have • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data • You can afford to scan the data once
  • 6. What is Data Science? • Data Science is about data gathering, analysis and decision-making. • Data Science is about finding patterns in data, through analysis, and make future predictions. • By using Data Science, companies are able to make: • Better decisions (should we choose A or B) • Predictive analysis (what will happen next?) • Pattern discoveries (find pattern, or maybe hidden information in the data)
  • 7. Where is Data Science Needed? Examples of where Data Science is needed: • For route planning: To discover the best routes to ship • To foresee delays for flight/ship/train etc. (through predictive analysis) • To create promotional offers • To find the best suited time to deliver goods • To forecast the next years revenue for a company • To analyze health benefit of training • To predict who will win elections
  • 8. How Does a Data Scientist Work? • A Data Scientist requires expertise in several backgrounds: • Machine Learning • Statistics • Programming (Python or R) • Mathematics • Databases • A Data Scientist must find patterns within the data. Before he/she can find the patterns, he/she must organize the data in a standard format.
  • 9. Here is how a Data Scientist works: • Ask the right questions - To understand the business problem. • Explore and collect data - From database, web logs, customer feedback, etc. • Extract the data - Transform the data to a standardized format. • Clean the data - Remove erroneous values from the data. • Find and replace missing values - Check for missing values and replace them with a suitable value (e.g. an average value).
  • 10. • Normalize data - Scale the values in a practical range (e.g. 140 cm is smaller than 1,8 m. However, the number 140 is larger than 1,8. - so scaling is important). • Analyze data, find patterns and make future predictions. • Represent the result - Present the result with useful insights in a way the "company" can understand. •