SlideShare a Scribd company logo
Predicting the Oscars with Data Science
http://bit.ly/tf-predict-oscars
About me
• Jasjit Singh
• Self-taught developer
• Worked in finance & tech
• Co-Founder Hotspot
• Thinkful General Manager
About us
Thinkful prepares students for web development &
data science jobs with 1-on-1 mentorship programs
What’s your background?
• I have a software background
• I have a math or stats background
• None of the above
Learning goals
• “Data Science Process”
• Python’s data science toolkit and methods
• Basic machine learning concepts
Data Science Process
• Frame the question.
• Collect the raw data.
• Process the data.
• Explore the data.
• Communicate results.
Frame the question
• Who will win the Oscar for Best Picture?
Collect the Data
• What kind of data do we need?
Collect the Data
• Financial data (Budget, box office…)
• Reviews, ratings and scores.
• Awards and nominations.
Process the data
• How’s the data “dirty” and how can we fix it?
Process the data
• User input, redundancies, missing data…
• Formatting: adapt the data to meet certain
specifications.
• Cleaning: detecting and correcting
corrupt or inaccurate records.
Explore the data
• What are the meaningful patterns in the
data?
• How meaningful is each data point for our
predictions?
Communicate the Data
Jupyter Notebooks
• One of data scientist’s everyday tools.
• Lets us show our work: can read the story,
follow the process, and run the code
• Find the links in our classroom tool.
NumPy
• The fundamental package for scientific
computing with Python.
• Lets us store our data into special multi-
dimensional array objects.
• Many methods for fast operations on arrays.
Pandas
• Fundamental high-level building block for
doing practical, real world data analysis in
Python.
• Built on top of NumPy.
Scikit-learn
• Python module for machine learning.
Working through the code
• The goal is to have you follow along. Towards
the end of the class, you can play around
with the code yourself
• Go to Jupyter notebook (http://bit.ly/tf-
jupyter)
• Open “Stage One.ipynb”
Initial imports and loading data with Pandas
Understanding your data
• .head(n) method: Returns first n rows.
• .value_counts() method: Returns the counts
of unique values in that column
Understanding your data
Processing our data: Formatting
• Ratings are in a non-numeric format. We
need to assign each rating a unique integer
so that Python can handle the information.
• We’ll do that with the .ix method
Formatting your Data
Processing our Data: Cleaning
Classification vs Regression
• Regression — Predict values
• Classification — Predict categories.
Classification
Classification
?
For our model we’ll use a decision tree
• It breaks down a dataset into smaller and
smaller subsets.
• The final result is a model with a tree
structure that has:
• Decision nodes: ask a question and have
two or more branches.
• Leaf nodes: represent a classification or
decision.
Predict oscars (5:11)
Why a decision tree?
• Decision tree is just one of many algorithms
you can use (e.g. Naive Bayes, SVM, Least
Squares, Logistic, etc)
• We’re using decision trees because it mirrors
human decision making so it’s simpler to
understand and interpret
Creating your first Decision Tree
You will use the scikit-learn and numpy
libraries to build your first decision tree. We
will need the following to build a decision tree
• target: A one-dimensional numpy array
containing the target from the train data.
• features: A multidimensional numpy array
containing the features/predictors from the
train data.
Creating your first Decision Tree
Importances and Score
Problem #1: Poor Predictions
• We only got 1 out of 4 movies correct
(Amadeus, 1984)
• Open Stage 2 and lets try and improve our
predictions!
Solution: Modify the feature list
Run the prediction again
Success!?
• We now got 3 out of 4 movies correct
• Our model predicted Inglorious Bastards would win.
Hurt Locker won instead
Problem #2: Overfitting
• Resulting model too tied to the training set.
• It doesn’t generalize to new data, which is
the point of prediction.
Overfitting, a visualization
In overfitting, a statistical model describes random error or
noise instead of the underlying relationship.
Solution: Random Forest Classifier
• Random Forest Classifiers use many
Decision Trees to build a classifier.
• We introduce a bit of randomness.
• Each Tree can give a different answer (a
vote). The final classification is the most
common amongst the Trees.
Random Forest Classifier
Creating your first Decision Tree
Importances and Score
Results
1976
Rocky
1984
Amadeus
1996
The English Patient
2009
The Hurt Locker
And our prediction for the 2016
Oscar is…
Predict oscars (5:11)
We can predict the Oscars
Except for 2017 ¯_(ツ)_/¯
Next steps for learning
• Google (self-taught)
• Coursera
• Bootcamps
1-on-1 mentorship enables flexibility
Next steps for learning
Graduate outcomes
Job Titles after GraduationMonths until Employed
Special Introductory Offer
• Prep course for 50% off — $250 instead of $500
• Covers math, stats, Python, and data science toolkit
• Option to continue into full data science program
• If you’re interested talk to me after or email me at
jasjit@thinkful.com

More Related Content

PDF
Predict oscars (4:17)
PPTX
Predicting the NBA MVP
PDF
Ensemble methods for modeling financial data
PPTX
Understanding Basics of Machine Learning
PDF
Lecture 1: What is Machine Learning?
PPTX
Lecture 01: Machine Learning for Language Technology - Introduction
PPTX
Machine Learning 101 | Essential Tools for Machine Learning
PPTX
Introduction to Machine Learning
Predict oscars (4:17)
Predicting the NBA MVP
Ensemble methods for modeling financial data
Understanding Basics of Machine Learning
Lecture 1: What is Machine Learning?
Lecture 01: Machine Learning for Language Technology - Introduction
Machine Learning 101 | Essential Tools for Machine Learning
Introduction to Machine Learning

What's hot (20)

PPTX
Primer to Machine Learning
PPTX
What is Machine Learning?
PPT
Machine learning
PDF
ML Basics
PDF
Le Machine Learning de A à Z
PPTX
Data Science Salon Miami Presentation
PPTX
Introduction to machine learning
PPTX
Introduction to Machine Learning
PDF
Introduction to Machine Learning
PDF
Making Machine Learning Work in Practice - StampedeCon 2014
PPTX
Introduction to Machine Learning
PDF
The Wild West of Data Wrangling
PPTX
Machine Learning Algorithms
PPTX
Machine Learning
PPTX
Meetup sthlm - introduction to Machine Learning with demo cases
PPTX
Machine learning introduction
PPTX
Data Science: A Mindset for Productivity
PDF
Machine Learning and Applications
PPTX
Introduction to Machine Learning
PPT
activelearning.ppt
Primer to Machine Learning
What is Machine Learning?
Machine learning
ML Basics
Le Machine Learning de A à Z
Data Science Salon Miami Presentation
Introduction to machine learning
Introduction to Machine Learning
Introduction to Machine Learning
Making Machine Learning Work in Practice - StampedeCon 2014
Introduction to Machine Learning
The Wild West of Data Wrangling
Machine Learning Algorithms
Machine Learning
Meetup sthlm - introduction to Machine Learning with demo cases
Machine learning introduction
Data Science: A Mindset for Productivity
Machine Learning and Applications
Introduction to Machine Learning
activelearning.ppt
Ad

Similar to Predict oscars (5:11) (20)

PDF
Predict the Oscars with Data Science
PDF
Predict the Oscars with Data Science
PDF
Tf itpbapm
PDF
Tf itpbapm
PDF
Learning from data
PPTX
Data Science 101
PPTX
Building and deploying analytics
ODP
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
PDF
Module 6: Ensemble Algorithms
PDF
Intro to Python for Data Science
PDF
From decision trees to random forests
PDF
Module 5: Decision Trees
PDF
Diabetes Prediction Using Machine Learning
PDF
Using Decision Trees to Analyze Online Learning Data
PPTX
Predicting Movie Success on IMDb: A Data-Driven Approach
PDF
Tf itpptbo
PDF
Understanding random forests
PPTX
wk5ppt1_Titanic
PDF
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
PDF
It's Not Magic - Explaining classification algorithms
Predict the Oscars with Data Science
Predict the Oscars with Data Science
Tf itpbapm
Tf itpbapm
Learning from data
Data Science 101
Building and deploying analytics
How to Become a Tree Hugger: Random Forests and Predictive Modeling for Devel...
Module 6: Ensemble Algorithms
Intro to Python for Data Science
From decision trees to random forests
Module 5: Decision Trees
Diabetes Prediction Using Machine Learning
Using Decision Trees to Analyze Online Learning Data
Predicting Movie Success on IMDb: A Data-Driven Approach
Tf itpptbo
Understanding random forests
wk5ppt1_Titanic
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
It's Not Magic - Explaining classification algorithms
Ad

More from Thinkful (20)

PDF
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
PDF
LA 1/31/18 Intro to JavaScript: Fundamentals
PDF
LA 1/31/18 Intro to JavaScript: Fundamentals
PDF
Itjsf129
PDF
Twit botsd1.30.18
PDF
Build your-own-instagram-filters-with-javascript-202-335 (1)
PDF
Baggwjs124
PDF
Become a Data Scientist: A Thinkful Info Session
PDF
Vpet sd-1.25.18
PDF
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
PDF
How to Choose a Programming Language
PDF
Batbwjs117
PDF
1/16/18 Intro to JS Workshop
PDF
LA 1/16/18 Intro to Javascript: Fundamentals
PDF
(LA 1/16/18) Intro to JavaScript: Fundamentals
PDF
Websitesd1.15.17.
PDF
Bavpwjs110
PDF
Byowwhc110
PDF
Getting started-jan-9-2018
PDF
Introjs1.9.18tf
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
LA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: Fundamentals
Itjsf129
Twit botsd1.30.18
Build your-own-instagram-filters-with-javascript-202-335 (1)
Baggwjs124
Become a Data Scientist: A Thinkful Info Session
Vpet sd-1.25.18
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
How to Choose a Programming Language
Batbwjs117
1/16/18 Intro to JS Workshop
LA 1/16/18 Intro to Javascript: Fundamentals
(LA 1/16/18) Intro to JavaScript: Fundamentals
Websitesd1.15.17.
Bavpwjs110
Byowwhc110
Getting started-jan-9-2018
Introjs1.9.18tf

Recently uploaded (20)

PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
IGGE1 Understanding the Self1234567891011
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Empowerment Technology for Senior High School Guide
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Types and Its function , kingdom of life
Hazard Identification & Risk Assessment .pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Practical Manual AGRO-233 Principles and Practices of Natural Farming
IGGE1 Understanding the Self1234567891011
Indian roads congress 037 - 2012 Flexible pavement
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Final Presentation General Medicine 03-08-2024.pptx
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Empowerment Technology for Senior High School Guide
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Chinmaya Tiranga quiz Grand Finale.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Final Presentation General Medicine 03-08-2024.pptx
Cell Types and Its function , kingdom of life

Predict oscars (5:11)