SlideShare a Scribd company logo
You don’t have to be a Data
Scientist to do Data Science
@carmenmardiros (not a data scientist)
“Sexiest job of the 21st
century”
Why do I, a mere analyst, care?
The appeal of Data Science (for me as an analyst)
Increase
confidence
My own and others’ in my analyses as the complexity
of data and business ecosystem increases.
Become more
productive
Speed up the analysis cycle from exploration to
hypothesis to experimentation.
Add value in
new ways
As the business and technology landscape changes.
Operationalise analysis outcomes as data products.
“It’s just not for me...”
“I don’t have a degree in statistics or programming.”
No confidence to attend the
sessions.
Worried I would not understand
the content.
Worried I’d be spotted as a fraud.
(3m into my data science foray)
Understood much of the content
and terminology.
Mentally thought questions
others asked.
I knew more than I thought I did.
Predictive Analytics Summit 2013 Predictive Analytics Summit 2016
Doing data science requires a
PhD/going back to school.
Can’t do data science until you
can write an algorithm.
Bottom-up is the only way.
Doing data science requires
enthusiasm and confidence in
ourselves.
Can and should do data science
once we’ve conceptually
understood how and why the
algorithm works.
Top-down works.
Provide value, learn as you go.
Myth Truth
Adapt. Grow. Stay relevant.
Digital Analytics is changing fast
Increasingly
scientific
approaches
Essential as we move towards prescriptive analytics
at speed.
Become familiar
with data
science toolkit
We will be key to bridging the gap between PhDs,
machines and management.
May even use it ourselves for our day-to-day work.
Future-proof
ourselves
MS Office for Machine Learning coming soon at a
cloud near you.
3 Transformative
Data Science techniques
#1 Resampling
The Bootstrap
Number of observations: 100
Sample is representative (to the best of
our knowledge).
Observed mean: 17.54 months
The Bootstrap
Draw 100 random samples with
replacement.
Calculate for each one the mean:
[17.61, 16.21, 17.13, 14.08, 19.58 … ] # 100
Plot all means, the 2.5 and 97.5
percentiles and original observed mean.
Bootstrap is extremely versatile:
● Fewer assumptions than parametric
methods.
● Can be used on any statistic.
Simulations & Sensitivity Analysis
Simple simulation:
Given existing distribution of order values and a
given range of possible conversion rates , how
much £££ would we make if we doubled the
traffic to our website?
Sensitivity analysis
(or how to open up black boxes):
Given a predictive model, randomly generate
new data points for each input based on
observed distributions, create predictions using
the model and interpret distribution of
outcome scenarios.
Cross Validation
Iterations
1 Train fold Train fold Train fold Train fold Test fold
2 Train fold Train fold Train fold Test fold Train fold
3 Train fold Train fold Test fold Train fold Train fold
4 Train fold Test fold Train fold Train fold Train fold
5 Test fold Train fold Train fold Train fold Train fold
Assesses how well a predictive model generalises to unseen data.
Resampling
Protects you
from unsound
inference
Acknowledges and mitigates effects of variance and
noise in the data.
You already do this when you use confidence
intervals. Quantify uncertainty more often.
Paints possible
future scenarios
Leverages randomness and probability to give you
glimpses into possible future outcomes.
Embrace randomness. It's your ally into prescriptive
analytics.
#2 Faceted visualisation
Segmented view, side-by-side
Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
Segmented view, side-by-side
Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
Segmented view, side-by-side
Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
#3 Feature Engineering
What?!
#3 Feature Engineering
#3 Calculated Metrics or
Content Groupings?
Back on familiar territory.
Feature Engineering Examples
Unique content
views per user
by content type
# politics content views, # business content views
# short/long-form content views
Distribution of
content seen
per user
% politics content views in total content viewed
adjusted for uncertainty of small samples
Result: fat user-level table of attributes and
behaviour for analysis and modelling.
Feature Engineering Examples
Infer trading
calendar
activities
from data
(for time series
analysis)
# new marketing campaigns (first date with sessions)
# new brands launched (first date with pageviews)
# voucher codes at peak redeem-rate (date with
highest redeems)
# AB tests started (date with first events tracked)
# VIPs active on each date, etc
Result: fat date-level table of leading KPIs and
activities (model the ecosystem).
Feature Engineering
New ways of
capturing
underlying
phenomena
Seasoned data scientists: Feature engineering often
yields higher rewards than pushing the latest
algorithms.
You likely already do this, likely in Excel.
It’s painful and limiting.
Your analytical creativity needs better tools.
SQL: The single most valuable tool in our toolkit.
We become self-sufficient analysts.
Resources
Inspired?
Learn Python https://try.jupyter.org/ -- start learning python for
data science right now (no setup!).
https://learncodethehardway.org/python/
Learn Machine
Learning
http://machinelearningmastery.com/
Understand how algorithms using spreadsheets.
Top-down approach. No programming required.
Learn SQL https://learncodethehardway.org/sql/

More Related Content

PDF
MeasureCamp #10 - WTF are Related Products in Google Analytics Ecommerce?
PDF
Phils Session cards @ Measurecamp
PDF
CRO analytics - How to Continually Optimise
PDF
Plan a Digital Analytics Training Strategy for an Analytics Agency
PDF
SEO analytics: How to report & improve performance
PDF
SMX Advanced - When to use Machine Learning for Search Campaigns
PDF
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
PDF
Agile Analytics
MeasureCamp #10 - WTF are Related Products in Google Analytics Ecommerce?
Phils Session cards @ Measurecamp
CRO analytics - How to Continually Optimise
Plan a Digital Analytics Training Strategy for an Analytics Agency
SEO analytics: How to report & improve performance
SMX Advanced - When to use Machine Learning for Search Campaigns
Building a Marketing Data Warehouse from Scratch - SMX Advanced 202
Agile Analytics

What's hot (20)

PDF
Heroconf London 2018_Automating Search Query Processing
PDF
Data Driven Attribution in BigQuery with Shapley Values and Markov Chains
PDF
Machine Learning in PPC: How to get started today | Chris Gutknecht | Friends...
PPTX
Google Tag Manager for beginners
PDF
Analytics Tools to improve Customer Insight
PPTX
Google tag manager fundamentals question and answer (june 23 and july 24, 2015)
PDF
Clicktale Vendor Privacy Audit (August 2013)
PDF
Martijn Scheijbeler @ All Things DATA 2016
PPTX
Google Tag Manager - Introduction & Implementation
PPTX
Google Tag Manager - Measure Twice, Cut Once
PDF
Questioning data quality and troubleshooting tracking gaps (version2 | Smx Su...
PPTX
What to Expect from the Google Analytics Exam 2014
PPTX
Seo Tips: Google News
PDF
Questioning Data Quality and Troubleshooting Tracking Gaps (SMX Munich 2020)
PPTX
An Introduction To Google Analytics
PPTX
PPT - Google Data Studio
PDF
Top 10 Google Analytics tips to save you money!
PPTX
Google Tag Manager | Google Tag Manager Tutorial 2019 | Google Tag Manager Se...
PDF
Google Analytics with an Intro to Google Tag Manager for Austin WordPress Meetup
PDF
BrightonSEO_How to create harmony between SEOs & Developers
Heroconf London 2018_Automating Search Query Processing
Data Driven Attribution in BigQuery with Shapley Values and Markov Chains
Machine Learning in PPC: How to get started today | Chris Gutknecht | Friends...
Google Tag Manager for beginners
Analytics Tools to improve Customer Insight
Google tag manager fundamentals question and answer (june 23 and july 24, 2015)
Clicktale Vendor Privacy Audit (August 2013)
Martijn Scheijbeler @ All Things DATA 2016
Google Tag Manager - Introduction & Implementation
Google Tag Manager - Measure Twice, Cut Once
Questioning data quality and troubleshooting tracking gaps (version2 | Smx Su...
What to Expect from the Google Analytics Exam 2014
Seo Tips: Google News
Questioning Data Quality and Troubleshooting Tracking Gaps (SMX Munich 2020)
An Introduction To Google Analytics
PPT - Google Data Studio
Top 10 Google Analytics tips to save you money!
Google Tag Manager | Google Tag Manager Tutorial 2019 | Google Tag Manager Se...
Google Analytics with an Intro to Google Tag Manager for Austin WordPress Meetup
BrightonSEO_How to create harmony between SEOs & Developers
Ad

Viewers also liked (20)

PPTX
Chi squared test for digital analytics
PDF
How to Analyse and Monitor the Health of Your Customer Base
PDF
The Lego Data Layer
PDF
Morphing GA into an Affiliate Analytics Monster
PDF
Using Lifecycle Scores for Marketing Optimisation
PDF
How to Sharpen Your Investigative Analysis with PowerPivot
PDF
Visitor Intent: Smart clues for understanding customer journeys
PDF
Contribution Modelling using Conversion Path Coverage
PDF
4 clicks 2 Measurement - Analytics Automation @ SuperWeek
PDF
Google Data Studio - First impressions @ Measurecamp
PDF
動的最適化の今までとこれから
PDF
Feature Engineering
PPTX
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
PDF
Machine Learning in action
PDF
Find signal in noise.
PPTX
Proactive Measures for Good Site Health - Brighton SEO 2014
PPTX
MeasureCamp London - Using enhanced ecommerce for non-ecommerce websites
PDF
Achtung panzer
PDF
A study of digital data about yourself - By Phil Pearce
PPTX
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
Chi squared test for digital analytics
How to Analyse and Monitor the Health of Your Customer Base
The Lego Data Layer
Morphing GA into an Affiliate Analytics Monster
Using Lifecycle Scores for Marketing Optimisation
How to Sharpen Your Investigative Analysis with PowerPivot
Visitor Intent: Smart clues for understanding customer journeys
Contribution Modelling using Conversion Path Coverage
4 clicks 2 Measurement - Analytics Automation @ SuperWeek
Google Data Studio - First impressions @ Measurecamp
動的最適化の今までとこれから
Feature Engineering
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
Machine Learning in action
Find signal in noise.
Proactive Measures for Good Site Health - Brighton SEO 2014
MeasureCamp London - Using enhanced ecommerce for non-ecommerce websites
Achtung panzer
A study of digital data about yourself - By Phil Pearce
TOP UNIVERSITIES IN US FOR MS IN DATA SCIENCE
Ad

Similar to You Don't Have to Be a Data Scientist to Do Data Science (20)

PPTX
Fundamentals of Analytics and Statistic (1).pptx
PDF
Barga, roger. predictive analytics with microsoft azure machine learning
PDF
Data Science Introduction and Process in Data Science
PPTX
The Power of Data Science by DICS INNOVATIVE.pptx
PDF
Ultimate Data Science Cheat Sheet For Success
PPTX
Introduction to data science
PPTX
Ch7-Overview of data Science-part 1.pptx
PPTX
Chapter 1 Introduction to Data Science (Computing)
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PPTX
Data Science Demystified
PDF
Untitled document.pdf
PDF
Introduction to Data Science - Fundamentals
PPTX
CSE3038_Module1 - updated v1.1bvjchcghvkhvjkvjvkjvh.pptx
PPT
Data Science-1 (1).ppt
PDF
Getting started in data science (4:3)
PDF
Getting started in data science (4:3)
PDF
Data Analyticsusing Power BI & Advance Excel
PPT
data science ppt of emngineering studnets
PDF
From Rocket Science to Data Science
Fundamentals of Analytics and Statistic (1).pptx
Barga, roger. predictive analytics with microsoft azure machine learning
Data Science Introduction and Process in Data Science
The Power of Data Science by DICS INNOVATIVE.pptx
Ultimate Data Science Cheat Sheet For Success
Introduction to data science
Ch7-Overview of data Science-part 1.pptx
Chapter 1 Introduction to Data Science (Computing)
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Data Science Demystified
Untitled document.pdf
Introduction to Data Science - Fundamentals
CSE3038_Module1 - updated v1.1bvjchcghvkhvjkvjvkjvh.pptx
Data Science-1 (1).ppt
Getting started in data science (4:3)
Getting started in data science (4:3)
Data Analyticsusing Power BI & Advance Excel
data science ppt of emngineering studnets
From Rocket Science to Data Science

Recently uploaded (20)

PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Quality review (1)_presentation of this 21
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Foundation of Data Science unit number two notes
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
annual-report-2024-2025 original latest.
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Miokarditis (Inflamasi pada Otot Jantung)
Business Acumen Training GuidePresentation.pptx
Introduction to Knowledge Engineering Part 1
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Quality review (1)_presentation of this 21
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Foundation of Data Science unit number two notes
Fluorescence-microscope_Botany_detailed content
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to machine learning and Linear Models
annual-report-2024-2025 original latest.
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Clinical guidelines as a resource for EBP(1).pdf

You Don't Have to Be a Data Scientist to Do Data Science

  • 1. You don’t have to be a Data Scientist to do Data Science @carmenmardiros (not a data scientist)
  • 2. “Sexiest job of the 21st century” Why do I, a mere analyst, care?
  • 3. The appeal of Data Science (for me as an analyst) Increase confidence My own and others’ in my analyses as the complexity of data and business ecosystem increases. Become more productive Speed up the analysis cycle from exploration to hypothesis to experimentation. Add value in new ways As the business and technology landscape changes. Operationalise analysis outcomes as data products.
  • 4. “It’s just not for me...” “I don’t have a degree in statistics or programming.”
  • 5. No confidence to attend the sessions. Worried I would not understand the content. Worried I’d be spotted as a fraud. (3m into my data science foray) Understood much of the content and terminology. Mentally thought questions others asked. I knew more than I thought I did. Predictive Analytics Summit 2013 Predictive Analytics Summit 2016
  • 6. Doing data science requires a PhD/going back to school. Can’t do data science until you can write an algorithm. Bottom-up is the only way. Doing data science requires enthusiasm and confidence in ourselves. Can and should do data science once we’ve conceptually understood how and why the algorithm works. Top-down works. Provide value, learn as you go. Myth Truth
  • 7. Adapt. Grow. Stay relevant.
  • 8. Digital Analytics is changing fast Increasingly scientific approaches Essential as we move towards prescriptive analytics at speed. Become familiar with data science toolkit We will be key to bridging the gap between PhDs, machines and management. May even use it ourselves for our day-to-day work. Future-proof ourselves MS Office for Machine Learning coming soon at a cloud near you.
  • 11. The Bootstrap Number of observations: 100 Sample is representative (to the best of our knowledge). Observed mean: 17.54 months
  • 12. The Bootstrap Draw 100 random samples with replacement. Calculate for each one the mean: [17.61, 16.21, 17.13, 14.08, 19.58 … ] # 100 Plot all means, the 2.5 and 97.5 percentiles and original observed mean. Bootstrap is extremely versatile: ● Fewer assumptions than parametric methods. ● Can be used on any statistic.
  • 13. Simulations & Sensitivity Analysis Simple simulation: Given existing distribution of order values and a given range of possible conversion rates , how much £££ would we make if we doubled the traffic to our website? Sensitivity analysis (or how to open up black boxes): Given a predictive model, randomly generate new data points for each input based on observed distributions, create predictions using the model and interpret distribution of outcome scenarios.
  • 14. Cross Validation Iterations 1 Train fold Train fold Train fold Train fold Test fold 2 Train fold Train fold Train fold Test fold Train fold 3 Train fold Train fold Test fold Train fold Train fold 4 Train fold Test fold Train fold Train fold Train fold 5 Test fold Train fold Train fold Train fold Train fold Assesses how well a predictive model generalises to unseen data.
  • 15. Resampling Protects you from unsound inference Acknowledges and mitigates effects of variance and noise in the data. You already do this when you use confidence intervals. Quantify uncertainty more often. Paints possible future scenarios Leverages randomness and probability to give you glimpses into possible future outcomes. Embrace randomness. It's your ally into prescriptive analytics.
  • 17. Segmented view, side-by-side Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
  • 18. Segmented view, side-by-side Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
  • 19. Segmented view, side-by-side Outstanding tools for exploratory data analysis: Seaborn in Python and ggplot in R
  • 21. #3 Feature Engineering #3 Calculated Metrics or Content Groupings? Back on familiar territory.
  • 22. Feature Engineering Examples Unique content views per user by content type # politics content views, # business content views # short/long-form content views Distribution of content seen per user % politics content views in total content viewed adjusted for uncertainty of small samples Result: fat user-level table of attributes and behaviour for analysis and modelling.
  • 23. Feature Engineering Examples Infer trading calendar activities from data (for time series analysis) # new marketing campaigns (first date with sessions) # new brands launched (first date with pageviews) # voucher codes at peak redeem-rate (date with highest redeems) # AB tests started (date with first events tracked) # VIPs active on each date, etc Result: fat date-level table of leading KPIs and activities (model the ecosystem).
  • 24. Feature Engineering New ways of capturing underlying phenomena Seasoned data scientists: Feature engineering often yields higher rewards than pushing the latest algorithms. You likely already do this, likely in Excel. It’s painful and limiting. Your analytical creativity needs better tools. SQL: The single most valuable tool in our toolkit. We become self-sufficient analysts.
  • 26. Inspired? Learn Python https://try.jupyter.org/ -- start learning python for data science right now (no setup!). https://learncodethehardway.org/python/ Learn Machine Learning http://machinelearningmastery.com/ Understand how algorithms using spreadsheets. Top-down approach. No programming required. Learn SQL https://learncodethehardway.org/sql/