SlideShare a Scribd company logo
Data Science?!
what even...
David Coallier
@davidcoallier
Data Scientist

Engine Yard
Data Science, what even?!
And I cook..
A lot.
(n-1) items
Adapting.
Feedback.
Indifference.
Young mathematically
inclined minds
Young mathematically inclined minds

We knew everything.
First Bad Assumption.
So we asked “experts”.
Wrong Ingredients
Bad Data
Tasted like sh*t
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
From Our Results
We had questions.
Found Expertise
Not Online.
Data Scientific
Method
Find a Question
Your Hypothesis
Current Data

What do you have?
Features & Tests
Try it.
Analyse Results
Won’t be pretty.
Conversation

Framed. By. Data.
But....
Good Discussions
Imply good data scientists
Hacking Skills
Hacking Skills

Maths &
Stats
Hacking Skills

Expertise

Maths &
Stats
Hacking Skills
Machine
Learning

Danger
Zone!!!

Expertise

Research

Maths &
Stats
Hacking Skills

Data
Science

Expertise

Maths &
Stats
Hacking Skills

Danger
Zone!!!

Machine
Learning

Data
Science
Maths &
Stats

Expertise
Research
Business

Don’t need an MBA
In other words.
1. Hacking
2. Maths & Stats
3. Expertise
Apply Method
Data Scientific
1. Question
2. Current Data
3. Features/Tests
4. Analyse
5. Converse
Find a Question

Let’s imagine Github
Upgrade Repos
Affect users as little as possible
import csv
content = csv.read('repo1.csv')
λ e
f (k; λ ) =
k!

k −k

for k >= 0
Data Science, what even?!
Converse

Present Findings
Iterate

Commits aren’t key.
KPIs are key

Indicators from experience
Questions

Super Important.
Just test it..
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
We are Human.

Emotional Connection
What next?

Second Hypothesis.
Data Science, what even?!
Focus on Data

Relevant to your KPIs.
Data Science, what even?!
Data gives you the what
Humans give you the why
Turn Information
Into

Actionable Insight
Create Discussions
Introspection Engines
Seeing, Feeling it
The brain sees.
Not regressions
Not p-values
Not slopes
Not F-statistics
Not coefficients
Data Science, what even?!
Data Science, what even?!
Question Data

Not Visualisations.
Toolbox

What do we use?
R
Modeling, Testing, Prototyping
RStudio

The IDE
lubridate
and zoo
Dealing with Dates...
yy/mm/dd
mm/dd/yy
YYYY-mm-dd HH:MM:ss TZ
yy-mm-dd
1363784094.513425
yy/mm
different timezone
reshape2

Reshape your Data
ggplot2

Visualise your Data
RCurl, RJSONIO
Find more Data
HMisc

Miscellaneous useful functions
forecast

Can you guess?
garch

Generalized Autoregressive
Conditional Heteroskedasticity
quantmod

Statistical Financial Trading
Data Science, what even?!
getSymbols('AAPL')
barChart(AAPL)
addMACD()
xts

Extensible Time Series
igraph

Study Networks
maptools

Read & View Maps
map('state', region = c(row.names(USArrests)), col=cm.colors(16, 1)[floor(USArrests$Rape/max(USArrests$Rape)*28)], fill=T)
Python

Scientific Computing
SciPy
http://www.scipy.org
scipy.stats
scipy.stats
Descriptive Statistics
from scipy.stats import
describe
s = [1,2,1,3,4,5]
print describe(s)
scipy.stats
Probability Distributions
Example
Poisson Distribution
λ e
f (k; λ ) =
k!

k −k

for k >= 0
import scipy.stats.poisson
p = poisson.pmf([1,2,3,4,1,2,3], 2)
print p.mean()
print p.sum()
...
NumPy
http://www.numpy.org/
NumPy
Linear Algebra
⎛ 1 0 ⎞
⎜ 0 1 ⎟
⎝
⎠
import numpy as np
x = np.array([ [1, 0], [0, 1] ])
vec, val = np.linalg.eig(x)
np.linalg.eigvals(x)
>>> np.linalg.eig(x)
(
array([ 1., 1.]),
array([
[ 1., 0.],
[ 0., 1.]
])
)
Matplotlib

Python Plotting
statsmodels
Advanced Statistics Modeling
NLTK

Natural Language Tool Kit
scikit-learn

Machine Learning
from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
clf.predict([[2., 2.]])
>>> array([1])
PyBrain

... Machine Learning
PyMC
Bayesian Inference
Pattern

Web Mining for Python
NetworkX

Study Networks
MILK: Machine Learning
Pandas

easy-to-use data structures
from pandas import *
x = DataFrame([
{"age": 26},
{"age": 19},
{"age": 21},
{"age": 18}
])
print x[x['age'] > 20].count()
print x[x['age'] > 20].mean()
Python vs R?

Different Purposes
Dogfooding

Data Scientific Method
Original Question
What is Data Science?
Back to you

For questioning

More Related Content

PDF
Data Science, what even...
PDF
Data Science at Scale @ barricade.io
PDF
Introduction To Data Science With Python
PDF
Clare Corthell: Learning Data Science Online
PDF
Introduction to Python
PDF
Claudia Gold: Learning Data Science Online
PDF
Introduction to Python for Data Science
PDF
Developing in R - the contextual Multi-Armed Bandit edition
Data Science, what even...
Data Science at Scale @ barricade.io
Introduction To Data Science With Python
Clare Corthell: Learning Data Science Online
Introduction to Python
Claudia Gold: Learning Data Science Online
Introduction to Python for Data Science
Developing in R - the contextual Multi-Armed Bandit edition

What's hot (18)

PDF
Python for Data Science
DOCX
Siddhant Thakur Resume
PDF
Search as Communication: Lessons from a Personal Journey
PDF
Module 1 introduction to machine learning
PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
PDF
How to become a data scientist
PDF
Module 9: Natural Language Processing Part 2
PPTX
Introduction to Big Data/Machine Learning
PPTX
Frontiers of Open Data Science Research
PDF
Ai black box
PDF
The Promise and Peril of Very Big Models
PDF
Enterprise Search: How do we get there from here?
PDF
Introduction to Data Science
PDF
Putting the Magic in Data Science
PDF
Introduction to machine learning and deep learning
PPTX
How to Start Doing Data Science
PPTX
Research presentation
PDF
Data Tactics Analytics Brown Bag (November 2013)
Python for Data Science
Siddhant Thakur Resume
Search as Communication: Lessons from a Personal Journey
Module 1 introduction to machine learning
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
How to become a data scientist
Module 9: Natural Language Processing Part 2
Introduction to Big Data/Machine Learning
Frontiers of Open Data Science Research
Ai black box
The Promise and Peril of Very Big Models
Enterprise Search: How do we get there from here?
Introduction to Data Science
Putting the Magic in Data Science
Introduction to machine learning and deep learning
How to Start Doing Data Science
Research presentation
Data Tactics Analytics Brown Bag (November 2013)
Ad

Viewers also liked (20)

PDF
omnicare annual reports 2005
PDF
omnicare annual reports 1998
PPT
The Americas 2
PDF
MOLSONCOORS_AR2004_EN
PDF
liz claiborne ar_2003
PPT
Презентация УТГ 2009
PDF
Code Qualität in agilen Teams - code.talks Hamburg 2015
PDF
Credit Suisse Presentation
PPTX
Thesis091009
PPTX
Fontys Gastles Svh Passie Voor Horeca Ehv
PDF
The Rise of Click Bait, Death of Quality Content, and What We Can Do About It
PDF
Hの次はC(オープンセミナー2014@広島 懇親会LT 2014/02/01)
PDF
Landing Page (Rus)
PDF
CloudStack入門以前
PDF
Rosanna Robin Glass Bath
PDF
Trainee's Chronicles
PDF
Tips to Succeed
PDF
Kwyjibos
PPT
Presentatie wijsheidsboeken 5v5 spreuken vyu 09102011
PPT
Ly fundraiser3 1
omnicare annual reports 2005
omnicare annual reports 1998
The Americas 2
MOLSONCOORS_AR2004_EN
liz claiborne ar_2003
Презентация УТГ 2009
Code Qualität in agilen Teams - code.talks Hamburg 2015
Credit Suisse Presentation
Thesis091009
Fontys Gastles Svh Passie Voor Horeca Ehv
The Rise of Click Bait, Death of Quality Content, and What We Can Do About It
Hの次はC(オープンセミナー2014@広島 懇親会LT 2014/02/01)
Landing Page (Rus)
CloudStack入門以前
Rosanna Robin Glass Bath
Trainee's Chronicles
Tips to Succeed
Kwyjibos
Presentatie wijsheidsboeken 5v5 spreuken vyu 09102011
Ly fundraiser3 1
Ad

Similar to Data Science, what even?! (20)

PDF
Getting to Know Your Data with R
PPTX
Building Data Scientists
PPTX
The R of War
PPTX
Data Science_Unit-1.2 part - 2 of intro.pptx
PDF
The Artful Business of Data Mining: Computational Statistics with Open Source...
PDF
Introduction to data science
PDF
Python for Data Science 1 / converted Edition Yuli Vasiliev
PDF
Data science unit 1 By: Professor Lili Saghafi
PDF
R tutorial
PPT
COM 578 Empirical Methods in Machine Learning and Data Mining
PPTX
Unit 2 - Data Manipulation with R.pptx
PPTX
Session 01 designing and scoping a data science project
PPTX
Session 01 designing and scoping a data science project
PPT
Data Munging in concepts of data mining in DS
PDF
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
PDF
Data science and Machine learning Booklet
PDF
Data science presentation
PPTX
ComputeFest 2012: Intro To R for Physical Sciences
PDF
Just the basics_strata_2013
PPTX
Data Science.pptx
Getting to Know Your Data with R
Building Data Scientists
The R of War
Data Science_Unit-1.2 part - 2 of intro.pptx
The Artful Business of Data Mining: Computational Statistics with Open Source...
Introduction to data science
Python for Data Science 1 / converted Edition Yuli Vasiliev
Data science unit 1 By: Professor Lili Saghafi
R tutorial
COM 578 Empirical Methods in Machine Learning and Data Mining
Unit 2 - Data Manipulation with R.pptx
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
Data Munging in concepts of data mining in DS
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data science and Machine learning Booklet
Data science presentation
ComputeFest 2012: Intro To R for Physical Sciences
Just the basics_strata_2013
Data Science.pptx

More from David Coallier (14)

PDF
PRISM seed-stage Investor Deck
KEY
Taking PHP to the next level
KEY
Mobile Cloud Architectures
KEY
Taking PHP To the next level
KEY
Orchestra at EngineYard
KEY
The Orchestra Platform
KEY
Breaking Technologies
KEY
Building APIs with FRAPI
KEY
RESTful APIs and FRAPI
PDF
Open Source for the greater good
PDF
PHP 5.3, a walkthrough
PDF
RESTful APIs and FRAPI, a matter of minutes
KEY
An introduction to CouchDB
KEY
Get ready for web3.0! Open up your app!
PRISM seed-stage Investor Deck
Taking PHP to the next level
Mobile Cloud Architectures
Taking PHP To the next level
Orchestra at EngineYard
The Orchestra Platform
Breaking Technologies
Building APIs with FRAPI
RESTful APIs and FRAPI
Open Source for the greater good
PHP 5.3, a walkthrough
RESTful APIs and FRAPI, a matter of minutes
An introduction to CouchDB
Get ready for web3.0! Open up your app!

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced IT Governance
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced Soft Computing BINUS July 2025.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Cloud computing and distributed systems.
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced IT Governance
Understanding_Digital_Forensics_Presentation.pptx
Advanced Soft Computing BINUS July 2025.pdf
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Data Science, what even?!