SlideShare a Scribd company logo
Introduction 
Sean Byrnes 
http://seanbyrnes.com 
@sbyrnes 
to 
Data Science
Who Am I? 
f 
ATTENDED 
FOUNDED 
CURRENTLY 
from Yahoo!
Introduction to Data Science 
• What is Data Science? 
• Example 1: Basic Math 
• Example 2: Regression Modeling 
• Example 3: Recommender Systems 
• Getting started in data science
What is Data Science? 
Software Engineering 
+ 
Statistical Analysis
What is Data Science? 
1. Question 
2. Data Gathering 
3. Exploration 
4. Modeling 
5. Answer 
6. Production
Example 1: Basic Math 
What is my customer churn rate? 
def. Churn rate: The percentage of subscribers to a 
service that discontinue their subscription to that service 
in a given time period. (aka attrition rate)
Example 1: Basic Math 
# customers at start 
Churn(month) = 
# customers lost
Example 1: Basic Math 
Month Churn 
Dec '13 3.75% 
Nov '13 1.87% 
Oct '13 3.82% 
Sep '13 2.76% 
Aug '13 2.43% 
Jul '13 2.04% 
Jun '13 1.60%
Example 1: Basic Math 
For all customers acquired in a given month 
Retention(Cmonth) = 
Active(Cmonth) 
Total(Cmonth)
Example 1: Basic Math 
0 1 2 3 4 5 6 
Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14% 
Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77% 
Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98% 
Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4% 
Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13% 
Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63% 
Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
Example 1: Basic Math 
0 1 2 3 4 5 6 
Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14% 
Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77% 
Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98% 
Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4% 
Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13% 
Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63% 
Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
Example 2: Regression Modeling 
How many users will we have next month?
Example 2: Regression Modeling 
160,000 
140,000 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
- 
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
Example 2: Regression Modeling 
For data set X(n), find f(n) such that 
f(ni) ~ X(ni)
Example 2: Regression Modeling 
Assume X(ni) = [x1, x2, … xk] 
f(n) = c1x1 + c2x2 + c3x3 + … + cnxn
Example 2: Regression Modeling 
160,000 
140,000 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
- 
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 
Linear Model
Example 2: Regression Modeling 
Assume X(ni) = [x1, x2, … xk] 
f(n) = c1x1 + c2x2 + c3x3 + … + cnxn 
Or, maybe 
f(n) = c1x1 + c2x1 
2 + c3x2 + c4x2 
2 + …+ cmxn 
2
Example 2: Regression Modeling 
160,000 
140,000 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
- 
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 
2nd Degree Polynomial Model
Example 2: Regression Modeling 
160,000 
140,000 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
- 
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 
4th Degree Polynomial Model
Example 2: Regression Modeling 
https://github.com/sbyrnes/Lyric
Example 3: Recommender Systems 
What other products might this 
customer buy?
Example 3: Recommender Systems 
Product 1 Product 2 Product 3 … Product N 
Customer 1 3.5 4.0 3.0 
Customer 2 2.0 3.5 
Customer 3 3.0 2.5 
… 
Customer 
N 
4.5 4.5
Example 3: Recommender Systems 
Given customer preference matrix M, find 
P x Q ~ M
Example 3: Recommender Systems 
Product 1 Product 2 Product 3 … Product N 
Customer 1 3.5 4.0 2.5 3.0 
Customer 2 2.0 1.5 3.5 3.0 
Customer 3 1.5 3.0 2.5 4.0 
… 
Customer 
N 
4.5 3.5 4.0 4.5
Example 3: Recommender Systems 
Given customer preferences c[p1,p2,…pn] 
and overall rating average roverall 
cbias = mean(c[p1], c[p2],… c[pn]) – roverall
Example 3: Recommender Systems 
https://github.com/sbyrnes/likely.js
Getting Started in Data Science 
• Programming 
• Statistics 
• Machine learning 
• Toolkit 
– R 
– Hadoop 
– D3
seanbyrnes.com 
@sbyrnes 
github.com/sbyrnes
Sean Byrnes 
seanbyrnes.com 
@sbyrnes 
github.com/sbyrnes

More Related Content

PPTX
Qcl 14-v3 pareto chart-banasthali university _silky jain
PPTX
Data science in Node.js
PPT
Optical illusions
PDF
Real-Time Machine Learning with Node.js - Philipp Burckhardt, Carnegie Mellon...
PPTX
Intro to data science module 1 r
PPTX
Introduction to data science
PDF
Introduction to Data Science
PPTX
Introduction to data science and its application in online advertising
Qcl 14-v3 pareto chart-banasthali university _silky jain
Data science in Node.js
Optical illusions
Real-Time Machine Learning with Node.js - Philipp Burckhardt, Carnegie Mellon...
Intro to data science module 1 r
Introduction to data science
Introduction to Data Science
Introduction to data science and its application in online advertising

Viewers also liked (15)

PDF
An Obligatory Introduction to Data Science
PDF
Introduction to Data Science (Data Science Thailand Meetup #1)
PDF
Introduction to Data Science - ESCP Europe
PDF
Data Science Introduction
PPTX
Introduction of Data Science
PPTX
How to write a Developer CV/Résumé that will get you hired
PPTX
Machine learning workshop @DYP Pune
PDF
Machine Learning
PDF
Introduction to Data Science
PDF
Introduction to Data Science and Analytics
PDF
Introduction to Data Science and Large-scale Machine Learning
PDF
Introduction on Data Science
PDF
How to Become a Data Scientist
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
An Obligatory Introduction to Data Science
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science - ESCP Europe
Data Science Introduction
Introduction of Data Science
How to write a Developer CV/Résumé that will get you hired
Machine learning workshop @DYP Pune
Machine Learning
Introduction to Data Science
Introduction to Data Science and Analytics
Introduction to Data Science and Large-scale Machine Learning
Introduction on Data Science
How to Become a Data Scientist
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Ad

Similar to Introduction to Data Science (20)

PPT
Training Module
DOCX
Smoking and Pregnancy SurveyPlease take this brief survey of w.docx
PDF
07 ch ken black solution
PPTX
Common Errors in ML
PPTX
Common Errors in ML
PPTX
BDW Chicago 2016 - Don Deloach, CEO and President, Infobright - Rethinking Ar...
PPTX
Demand Forecasting SUPPLY CHAIN MANA.pptx
PPTX
Demand Forecasting SUPPLY CHAIN MANA.pptx
PDF
A/B Testing - Design, Analysis and Pitfals
PDF
Solution Manual for Introductory Statistics 9th by Mann
PDF
08 ch ken black solution
PDF
153929081 80951377-regression-analysis-of-count-data
PDF
Metrics and Measurement Work Sampling Project
PDF
THESLING-PETER-6019098-EFR-THESIS
PDF
ALLL Webinar | CECL Methodologies Series Kick Off
PPTX
Agile 2014 Software Moneyball (Troy Magennis)
PDF
Intellectual Property Lebret December 2018
PPT
Week 11 data collation & analysis
PPTX
Unofficial Industrial Research Methodology
Training Module
Smoking and Pregnancy SurveyPlease take this brief survey of w.docx
07 ch ken black solution
Common Errors in ML
Common Errors in ML
BDW Chicago 2016 - Don Deloach, CEO and President, Infobright - Rethinking Ar...
Demand Forecasting SUPPLY CHAIN MANA.pptx
Demand Forecasting SUPPLY CHAIN MANA.pptx
A/B Testing - Design, Analysis and Pitfals
Solution Manual for Introductory Statistics 9th by Mann
08 ch ken black solution
153929081 80951377-regression-analysis-of-count-data
Metrics and Measurement Work Sampling Project
THESLING-PETER-6019098-EFR-THESIS
ALLL Webinar | CECL Methodologies Series Kick Off
Agile 2014 Software Moneyball (Troy Magennis)
Intellectual Property Lebret December 2018
Week 11 data collation & analysis
Unofficial Industrial Research Methodology
Ad

Recently uploaded (20)

PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
System and Network Administraation Chapter 3
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
ai tools demonstartion for schools and inter college
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
medical staffing services at VALiNTRY
PDF
Digital Strategies for Manufacturing Companies
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
top salesforce developer skills in 2025.pdf
Computer Software and OS of computer science of grade 11.pptx
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Odoo Companies in India – Driving Business Transformation.pdf
PTS Company Brochure 2025 (1).pdf.......
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Reimagine Home Health with the Power of Agentic AI​
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Understanding Forklifts - TECH EHS Solution
System and Network Administraation Chapter 3
Adobe Illustrator 28.6 Crack My Vision of Vector Design
CHAPTER 2 - PM Management and IT Context
ai tools demonstartion for schools and inter college
Operating system designcfffgfgggggggvggggggggg
Which alternative to Crystal Reports is best for small or large businesses.pdf
Digital Systems & Binary Numbers (comprehensive )
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
medical staffing services at VALiNTRY
Digital Strategies for Manufacturing Companies
Design an Analysis of Algorithms I-SECS-1021-03
top salesforce developer skills in 2025.pdf

Introduction to Data Science

  • 1. Introduction Sean Byrnes http://seanbyrnes.com @sbyrnes to Data Science
  • 2. Who Am I? f ATTENDED FOUNDED CURRENTLY from Yahoo!
  • 3. Introduction to Data Science • What is Data Science? • Example 1: Basic Math • Example 2: Regression Modeling • Example 3: Recommender Systems • Getting started in data science
  • 4. What is Data Science? Software Engineering + Statistical Analysis
  • 5. What is Data Science? 1. Question 2. Data Gathering 3. Exploration 4. Modeling 5. Answer 6. Production
  • 6. Example 1: Basic Math What is my customer churn rate? def. Churn rate: The percentage of subscribers to a service that discontinue their subscription to that service in a given time period. (aka attrition rate)
  • 7. Example 1: Basic Math # customers at start Churn(month) = # customers lost
  • 8. Example 1: Basic Math Month Churn Dec '13 3.75% Nov '13 1.87% Oct '13 3.82% Sep '13 2.76% Aug '13 2.43% Jul '13 2.04% Jun '13 1.60%
  • 9. Example 1: Basic Math For all customers acquired in a given month Retention(Cmonth) = Active(Cmonth) Total(Cmonth)
  • 10. Example 1: Basic Math 0 1 2 3 4 5 6 Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14% Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77% Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98% Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4% Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13% Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63% Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
  • 11. Example 1: Basic Math 0 1 2 3 4 5 6 Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14% Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77% Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98% Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4% Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13% Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63% Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
  • 12. Example 2: Regression Modeling How many users will we have next month?
  • 13. Example 2: Regression Modeling 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
  • 14. Example 2: Regression Modeling For data set X(n), find f(n) such that f(ni) ~ X(ni)
  • 15. Example 2: Regression Modeling Assume X(ni) = [x1, x2, … xk] f(n) = c1x1 + c2x2 + c3x3 + … + cnxn
  • 16. Example 2: Regression Modeling 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 Linear Model
  • 17. Example 2: Regression Modeling Assume X(ni) = [x1, x2, … xk] f(n) = c1x1 + c2x2 + c3x3 + … + cnxn Or, maybe f(n) = c1x1 + c2x1 2 + c3x2 + c4x2 2 + …+ cmxn 2
  • 18. Example 2: Regression Modeling 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 2nd Degree Polynomial Model
  • 19. Example 2: Regression Modeling 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13 4th Degree Polynomial Model
  • 20. Example 2: Regression Modeling https://github.com/sbyrnes/Lyric
  • 21. Example 3: Recommender Systems What other products might this customer buy?
  • 22. Example 3: Recommender Systems Product 1 Product 2 Product 3 … Product N Customer 1 3.5 4.0 3.0 Customer 2 2.0 3.5 Customer 3 3.0 2.5 … Customer N 4.5 4.5
  • 23. Example 3: Recommender Systems Given customer preference matrix M, find P x Q ~ M
  • 24. Example 3: Recommender Systems Product 1 Product 2 Product 3 … Product N Customer 1 3.5 4.0 2.5 3.0 Customer 2 2.0 1.5 3.5 3.0 Customer 3 1.5 3.0 2.5 4.0 … Customer N 4.5 3.5 4.0 4.5
  • 25. Example 3: Recommender Systems Given customer preferences c[p1,p2,…pn] and overall rating average roverall cbias = mean(c[p1], c[p2],… c[pn]) – roverall
  • 26. Example 3: Recommender Systems https://github.com/sbyrnes/likely.js
  • 27. Getting Started in Data Science • Programming • Statistics • Machine learning • Toolkit – R – Hadoop – D3
  • 29. Sean Byrnes seanbyrnes.com @sbyrnes github.com/sbyrnes