SlideShare a Scribd company logo
Introduction to Data Science
NILESH KUMAR
Outline
 Data, Big Data and Challenges
 Data Science
 Introduction
 Why Data Science
 Data Scientists
 What do they do?
 Major/Concentration in Data Science
 What courses to take.
Data All Around
 Lots of data is being collected
and warehoused
 Web data, e-commerce
 Financial transactions, bank/credit transactions
 Online trading and purchasing
 Social Network
How Much Data Do We
have?
 Google processes 20 PB a day (2008)
 Facebook has 60 TB of daily logs
 eBay has 6.5 PB of user data + 50 TB/day (5/2009)
 1000 genomes project: 200 TB
 Cost of 1 TB of disk: $35
 Time to read 1 TB disk: 3 hrs
(100 MB/s)
Big Data
Big Data is any data that is expensive to manage and hard to extract value
from
 Volume
 The size of the data
 Velocity
 The latency of data processing relative to the growing demand for interactivity
 Variety and Complexity
 the diversity of sources, formats, quality, structures.
Big Data
Types of Data We Have
 Relational Data (Tables/Transaction/Legacy Data)
 Text Data (Web)
 Semi-structured Data (XML)
 Graph Data
 Social Network, Semantic Web (RDF), …
 Streaming Data
 You can afford to scan the data once
What To Do With These
Data?
 Aggregation and Statistics
 Data warehousing and OLAP
 Indexing, Searching, and Querying
 Keyword based search
 Pattern matching (XML/RDF)
 Knowledge discovery
 Data Mining
 Statistical Modeling
What is Data Science?
 An area that manages, manipulates, extracts, and
interprets knowledge from tremendous amount of
data
 Data science (DS) is a multidisciplinary field of study
with goal to address the challenges in big data
 Data science principles apply to all data – big and
small
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
What is Data Science?
 Theories and techniques from many fields and
disciplines are used to investigate and analyze a
large amount of data to help decision makers in
many industries such as science, engineering,
economics, politics, finance, and education
 Computer Science
 Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
 Mathematics
 Mathematical Modeling
 Statistics
 Statistical and Stochastic modeling, Probability.
Data Science
Data Science
Real Life Examples
 Companies learn your secrets, shopping patterns, and
preferences
 For example, can we know if a woman is pregnant, even if
she doesn’t want us to know? Target case study
 Data Science and election (2008, 2012)
 1 million people installed the Obama Facebook app that
gave access to info on “friends”
Data Scientists
 Data Scientist
 The Sexiest Job of the 21st Century
 They find stories, extract knowledge. They are not
reporters
Data Scientists
 Data scientists are the key to realizing the
opportunities presented by big data. They bring
structure to it, find compelling patterns in it, and
advise executives on the implications for products,
processes, and decisions
What do Data Scientists
do?
 National Security
 Cyber Security
 Business Analytics
 Engineering
 Healthcare
 And more ….
Concentration in Data Science
 Mathematics and Applied Mathematics
 Applied Statistics/Data Analysis
 Solid Programming Skills (R, Python, Julia, SQL)
 Data Mining
 Data Base Storage and Management
 Machine Learning and discovery

More Related Content

PPTX
Introduction to Data Science 5-13.pptx
PPTX
Introduction to Data Science 5-13.pptx
PPTX
Introduction to Data Science
PPTX
Introduction to Data Science
PPTX
Introduction to Data Science\
PPTX
NumPy_ SciPy_ _ DatiiiikaFrames (2).pptx
PPTX
Introduction to Data Science
PPT
Opportunities in Data Science.ppt
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptx
Introduction to Data Science
Introduction to Data Science
Introduction to Data Science\
NumPy_ SciPy_ _ DatiiiikaFrames (2).pptx
Introduction to Data Science
Opportunities in Data Science.ppt

Similar to Introduction to Data Science 5-13.pptx (20)

PPTX
hjol.pptx
PPTX
Introduction to Data Science 5-13.pptx
PDF
Introduction to Data Science 5-13 (1).pdf
PPTX
Introduction to Data Science 5-13.pptx
PPTX
mkol.pptx
PPTX
Introduction to Data Science 1113.pptx
PPTX
Introduction to Data Science 1114.pptx
PPTX
Introduction to Data Science 1118.pptx
PPTX
Introduction to Data Science 1115.pptx
PPTX
Introduction to Data Science 1117.pptx
PPTX
Introduction to Data Science 1116.pptx
PPTX
Introduction to Data Science 112.pptx
PPTX
Introduction to Data Science 1119.pptx
PDF
Big Data, Big Deal: For Future Big Data Scientists
PPTX
Introduction to Data Science 1121.pptx
PDF
Semantic Web Investigation within Big Data Context
PDF
Big Data Ethics
PDF
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
PPTX
A Big Data Concept
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
hjol.pptx
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13 (1).pdf
Introduction to Data Science 5-13.pptx
mkol.pptx
Introduction to Data Science 1113.pptx
Introduction to Data Science 1114.pptx
Introduction to Data Science 1118.pptx
Introduction to Data Science 1115.pptx
Introduction to Data Science 1117.pptx
Introduction to Data Science 1116.pptx
Introduction to Data Science 112.pptx
Introduction to Data Science 1119.pptx
Big Data, Big Deal: For Future Big Data Scientists
Introduction to Data Science 1121.pptx
Semantic Web Investigation within Big Data Context
Big Data Ethics
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
A Big Data Concept
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Ad

Recently uploaded (20)

PDF
annual-report-2024-2025 original latest.
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Transcultural that can help you someday.
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Business Analytics and business intelligence.pdf
PPTX
Computer network topology notes for revision
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Introduction to Data Science and Data Analysis
PDF
Introduction to the R Programming Language
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Lecture1 pattern recognition............
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
annual-report-2024-2025 original latest.
Supervised vs unsupervised machine learning algorithms
IBA_Chapter_11_Slides_Final_Accessible.pptx
Transcultural that can help you someday.
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Business Analytics and business intelligence.pdf
Computer network topology notes for revision
Qualitative Qantitative and Mixed Methods.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Mega Projects Data Mega Projects Data
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Data Science and Data Analysis
Introduction to the R Programming Language
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Lecture1 pattern recognition............
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Ad

Introduction to Data Science 5-13.pptx

  • 1. Introduction to Data Science NILESH KUMAR
  • 2. Outline  Data, Big Data and Challenges  Data Science  Introduction  Why Data Science  Data Scientists  What do they do?  Major/Concentration in Data Science  What courses to take.
  • 3. Data All Around  Lots of data is being collected and warehoused  Web data, e-commerce  Financial transactions, bank/credit transactions  Online trading and purchasing  Social Network
  • 4. How Much Data Do We have?  Google processes 20 PB a day (2008)  Facebook has 60 TB of daily logs  eBay has 6.5 PB of user data + 50 TB/day (5/2009)  1000 genomes project: 200 TB  Cost of 1 TB of disk: $35  Time to read 1 TB disk: 3 hrs (100 MB/s)
  • 5. Big Data Big Data is any data that is expensive to manage and hard to extract value from  Volume  The size of the data  Velocity  The latency of data processing relative to the growing demand for interactivity  Variety and Complexity  the diversity of sources, formats, quality, structures.
  • 7. Types of Data We Have  Relational Data (Tables/Transaction/Legacy Data)  Text Data (Web)  Semi-structured Data (XML)  Graph Data  Social Network, Semantic Web (RDF), …  Streaming Data  You can afford to scan the data once
  • 8. What To Do With These Data?  Aggregation and Statistics  Data warehousing and OLAP  Indexing, Searching, and Querying  Keyword based search  Pattern matching (XML/RDF)  Knowledge discovery  Data Mining  Statistical Modeling
  • 9. What is Data Science?  An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data  Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data  Data science principles apply to all data – big and small https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
  • 10. What is Data Science?  Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education  Computer Science  Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI  Mathematics  Mathematical Modeling  Statistics  Statistical and Stochastic modeling, Probability.
  • 13. Real Life Examples  Companies learn your secrets, shopping patterns, and preferences  For example, can we know if a woman is pregnant, even if she doesn’t want us to know? Target case study  Data Science and election (2008, 2012)  1 million people installed the Obama Facebook app that gave access to info on “friends”
  • 14. Data Scientists  Data Scientist  The Sexiest Job of the 21st Century  They find stories, extract knowledge. They are not reporters
  • 15. Data Scientists  Data scientists are the key to realizing the opportunities presented by big data. They bring structure to it, find compelling patterns in it, and advise executives on the implications for products, processes, and decisions
  • 16. What do Data Scientists do?  National Security  Cyber Security  Business Analytics  Engineering  Healthcare  And more ….
  • 17. Concentration in Data Science  Mathematics and Applied Mathematics  Applied Statistics/Data Analysis  Solid Programming Skills (R, Python, Julia, SQL)  Data Mining  Data Base Storage and Management  Machine Learning and discovery