SlideShare a Scribd company logo
Starting Data Science
With Kaggle.com
6/25/2017
Starting Data Science with Kaggle.com -
Nathaniel Shimoni
1Nathaniel Shimoni 25/6/2017
• What is Kaggle?
• Why is Kaggle so great? The everyone wins approach
• Kaggle tiers & top kagglers
• Frequently used terms and the main rules
• The benefits of starting with Kaggle
• Common Kaggle data science process
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
2
Talk outline
• An online platform that runs data science competitions
• Declares itself to be the home of data science
• Has over 1M registered users & over 60k active users
• One of the most vibrant communities for data scientists
• A great place to meet other “data people”
• A great place to learn and test your data & modeling
skills
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
3
What is kaggle?
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
4
Why is Kaggle so great? (the everyone wins approach)
• Receives prizes,
knowledge,
exposure &
portfolio
showcase
• Rapid development
& adoption of
highly performing
platforms
• Receives money,
from competition
sponsors
• influence on the
community
• knowledge on the
platforms & algo.
trends
• Have data &
business task but
no data scientists
• Receives state of
the art models
quickly and
without hiring
data scientists
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
5
My Kaggle profile
• Novice – a new Kaggle user
• Contributor – participated in one or more competitions,
ran a kernel, and is active in the forums
• Expert – 2 top 25% finishes
• Master - 2 top 10% finishes, & 1 top 10 (places) finish
• Grandmaster – 5 top 10 finishes & 1 solo top 10 finish
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
6
Kaggle tiers
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
7
Top Kagglers
• Leaderboard (public & private)
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
8
Frequently used terms
Training
Public
LB
Private LB
Available once
approved the rules
Used for ranking
submissions
through the
competition
Training data Testing data
Used for final scoring
(the only score that truly matters)
Public LB can serve as
additional validation
frame, but can also be
source of over fitting
• Leakage - the introduction of information about
the target that is not a legitimate predictor
(usually by a mistake within the data preparation process)
• Team merger – 2 or more participants competing
together
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
9
Frequently used terms
• LB shuffle – the re-ranking that occurs at the end
of the competition (upon moving from public to private LB)
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
10
Frequently used terms
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
11
Main rules for Kaggle competitions
• One account per user
• No private sharing outside teams
(public sharing is usually allowed and endorsed)
• Limited number of entries per day & per competition
• Winning solutions must be written in open source code
• Winners should hand well documented source code in
order to be eligible of the price
• Usually select 2 solutions for final evaluation
• Project based learning – learn by doing
• Solve real world challenges
• Great supporting community
• Benchmark solutions & shared code samples
• Clear business objective and modeling task
• Develop work portfolio and rank yourself against
other competitors (and get recognition)
• Compete against state of the art solutions
• Learn (a lot!!!) when competition ends
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
12
Why start with Kaggle?
• Ability to team-up with others:
 learn from better Kagglers
 learn how to collaborate effectively
 merge different solutions to achieve a score boost
 meet new exciting people
• Answer the questions of others – you only truly learn
something when you teach it to someone else
• Ability to apply new ideas at work with little effort
• Varied areas of activity (verticals)
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
13
Why start with Kaggle?
• The ability to follow many experts where each of them
specializes in a particular area (sample from my list)
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
14
Why start with Kaggle?
Ensemble learning
Mathias Müller
Feature extraction
Darius Barušauskas
Validation
Gert Jacobusse
Super fast draft modeling
ZFTurbo - unknown
Inspiration – no minimal age for data science
Mikel Bober-Irizar
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
15
Common Kaggle Data Science process
Data cleaning
Data
augmentation
Adding
External Data
Single
models
Feature
engineering
Exploratory
data
analysis
Single
models
Diverse
single
models
Set the
correct
validation
method
Ensemble
learning
Final
prediction
EDA
Feature generation
modeling
Ensemble
learning
Data cleaning
& augmenting
Not always allowed yet
good practice to
consider when possible
40%20% 30% 10%
% of total
time spent in
each activity
• Impute missing values
(mean, median, most common value, use separate prediction task)
• Remove zero variance features
• Remove duplicated features
• Outlier removal – caution can be harmful, at cleaning
stage we’ll remove irrelevant values (e.g. negative price)
• Na’s encoding / imputing
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
16
Data cleaning
• External data sources:
 open street map
 weather measurement data
 online calendars
• API’s
• Scraping (using ScraPy / beautiful soup)
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
17
Data augmentation & external data
• Rescaling/ standardization of existing features
• Performing data transformations:
Tf-Idf, log1p, min-max scaling, binning of numeric features
• Turn categorical features to numeric
(label encoding / one hot encoding)
• Create count features
• Parsing textual features to get more generalizable
features
• Hashing trick
• Extracting date/time features i.e DayOfWeek, month, year,
dayOfMonth, isHoliday etc.
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
18
Feature engineering
• Remove near-zero-variance features
• Use feature importance and eliminate least
important features
• Recursive Feature Elimination
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
19
Feature selection
• Grid search CV (exhaustive, rarely better than alternatives)
• Random search CV
• Hyper-opt
• Bayesian optimization
* Hyper parameter adjustment will usually yield
better results but not as much as other activities
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
20
Hyper parameter optimization
• Train test split
• Shuffle split
• Kfold is the most commonly used
• Time based separation
• Group Kfold
• Leave one group out
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
21
Validation
• Simple/weighted average of previous best models
• Bagging of same type of models (i.e different rng,
different hyper-param)
• Majority vote
• Using out of fold predictions as meta features
a.k.a stacking
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
22
Ensemble learning
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
23
Out Of Fold predictions – a.k.a meta features
fold1
fold2
fold3
fold4
oof 1
oof 2
oof 3
oof 4
Out of fold
predictions
Averaged
test
predictions
Test
predictions
fold1
Test
predictions
fold2
Test
predictions
fold3
Test
predictions
fold4
Divided training data - train on 3 folds
predict the forth fold and the testing data
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
24
Out Of Fold predictions – a.k.a meta features
fold1
fold2
fold3
fold4
oof 1
oof 2
oof 3
oof 4
Out of fold
predictions
Averaged
test
predictions
Test
predictions
fold1
Test
predictions
fold2
Test
predictions
fold3
Test
predictions
fold4
Divided training data - train on 3 folds
predict the forth fold and the testing data
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
25
Out Of Fold predictions – a.k.a meta features
oof 1
oof 2
oof 3
oof 4
Model 1
e.g. knn
Averaged test
predictions
Out of fold
predictions
oof 1
oof 2
oof 3
oof 4
Model 2
e.g. NN
oof 1
oof 2
oof 3
oof 4
Model 3
e.g. gbm
Train
labels
Model 1
e.g. knn
Model 2
e.g. NN
Model 3
e.g. gbm
After training several models using this method (3 different models in this sample)
We can now train a new model using our newly formed meta features
* Note that we can either train our meta model using only these new features or use
the new features along with our original train data for training
• Large focus on modeling relatively to the rest of
the steps in the process
• Small weight to runtime and scalability
• Little reasoning for selecting a specific eval metric
• Competing for the last few percent points isn’t
always valuable
• “Click and submit” phenomena
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
26
Disadvantages of Kaggle
• MOOC’s:
 Machine learning – Stanford Coursera
 Data science track – Johns Hopkins Coursera
 Udacity deep learning course
• Documentation:
 Scikit learn documentation
 Keras documentation
 R caret package documentation
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
27
Additional reading resources
• This presentation draws heavily from the
following sources:
• Mark Peng’s presentation
“Tips for participating Kaggle challenges”
• Darius Barušauskas’s presentation
“Tips and tricks to win Kaggle data science competitions”
• Kaggle discussion forums and blog
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
28
Links to sources
Questions?
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
29
6/25/2017
Starting Data Science with Kaggle.com
Nathaniel Shimoni
30

More Related Content

PPTX
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
PDF
Data visualisation laboratory report/manual
PDF
Understanding random forests
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
PDF
From decision trees to random forests
PPTX
Random forest algorithm
PPTX
Joint probability
PDF
A Kaggle Talk
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Data visualisation laboratory report/manual
Understanding random forests
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
From decision trees to random forests
Random forest algorithm
Joint probability
A Kaggle Talk

Similar to Starting data science with kaggle.com (20)

PPTX
Kaggle Days Milan - March 2019
PPTX
How to get into Kaggle? by Philipp Singer and Dmitry Gordeev
PDF
R, Data Wrangling & Kaggle Data Science Competitions
PDF
Kaggle: Crowd Sourcing for Data Analytics
PDF
Kaggle Days Brussels - Alberto Danese
PDF
Kaggle and data science
PDF
Kaggle - global Data Science community
PDF
Beat the Benchmark.
PDF
Beat the Benchmark.
PDF
Data science presentation
PDF
Winning Data Science Competitions
PDF
The coding portion of Data Science
PDF
Data Science Competition
PDF
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
PPTX
Data Science Competition
PDF
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan Risdal
PDF
Data Science with Spark - Training at SparkSummit (East)
PPT
kaggle_meet_up
PDF
Data Wrangling For Kaggle Data Science Competitions
PDF
introduction-to-data-science-210911034830 (1).pdf
Kaggle Days Milan - March 2019
How to get into Kaggle? by Philipp Singer and Dmitry Gordeev
R, Data Wrangling & Kaggle Data Science Competitions
Kaggle: Crowd Sourcing for Data Analytics
Kaggle Days Brussels - Alberto Danese
Kaggle and data science
Kaggle - global Data Science community
Beat the Benchmark.
Beat the Benchmark.
Data science presentation
Winning Data Science Competitions
The coding portion of Data Science
Data Science Competition
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Data Science Competition
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan Risdal
Data Science with Spark - Training at SparkSummit (East)
kaggle_meet_up
Data Wrangling For Kaggle Data Science Competitions
introduction-to-data-science-210911034830 (1).pdf
Ad

More from Nathaniel Shimoni (6)

PPTX
Time Series Foundation Models - current state and future directions
PPTX
Current and future challenges in data science
PPTX
ML whitepaper v0.2
PPTX
Machine learning basic course with KNIME analytics platform
PPTX
My path to data science
PPTX
Introduction to competitive data science
Time Series Foundation Models - current state and future directions
Current and future challenges in data science
ML whitepaper v0.2
Machine learning basic course with KNIME analytics platform
My path to data science
Introduction to competitive data science
Ad

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Galatica Smart Energy Infrastructure Startup Pitch Deck
Supervised vs unsupervised machine learning algorithms
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
.pdf is not working space design for the following data for the following dat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Clinical guidelines as a resource for EBP(1).pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
climate analysis of Dhaka ,Banglades.pptx
Foundation of Data Science unit number two notes
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Moving the Public Sector (Government) to a Digital Adoption
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

Starting data science with kaggle.com

  • 1. Starting Data Science With Kaggle.com 6/25/2017 Starting Data Science with Kaggle.com - Nathaniel Shimoni 1Nathaniel Shimoni 25/6/2017
  • 2. • What is Kaggle? • Why is Kaggle so great? The everyone wins approach • Kaggle tiers & top kagglers • Frequently used terms and the main rules • The benefits of starting with Kaggle • Common Kaggle data science process 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 2 Talk outline
  • 3. • An online platform that runs data science competitions • Declares itself to be the home of data science • Has over 1M registered users & over 60k active users • One of the most vibrant communities for data scientists • A great place to meet other “data people” • A great place to learn and test your data & modeling skills 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 3 What is kaggle?
  • 4. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 4 Why is Kaggle so great? (the everyone wins approach) • Receives prizes, knowledge, exposure & portfolio showcase • Rapid development & adoption of highly performing platforms • Receives money, from competition sponsors • influence on the community • knowledge on the platforms & algo. trends • Have data & business task but no data scientists • Receives state of the art models quickly and without hiring data scientists
  • 5. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 5 My Kaggle profile
  • 6. • Novice – a new Kaggle user • Contributor – participated in one or more competitions, ran a kernel, and is active in the forums • Expert – 2 top 25% finishes • Master - 2 top 10% finishes, & 1 top 10 (places) finish • Grandmaster – 5 top 10 finishes & 1 solo top 10 finish 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 6 Kaggle tiers
  • 7. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 7 Top Kagglers
  • 8. • Leaderboard (public & private) 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 8 Frequently used terms Training Public LB Private LB Available once approved the rules Used for ranking submissions through the competition Training data Testing data Used for final scoring (the only score that truly matters) Public LB can serve as additional validation frame, but can also be source of over fitting
  • 9. • Leakage - the introduction of information about the target that is not a legitimate predictor (usually by a mistake within the data preparation process) • Team merger – 2 or more participants competing together 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 9 Frequently used terms
  • 10. • LB shuffle – the re-ranking that occurs at the end of the competition (upon moving from public to private LB) 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 10 Frequently used terms
  • 11. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 11 Main rules for Kaggle competitions • One account per user • No private sharing outside teams (public sharing is usually allowed and endorsed) • Limited number of entries per day & per competition • Winning solutions must be written in open source code • Winners should hand well documented source code in order to be eligible of the price • Usually select 2 solutions for final evaluation
  • 12. • Project based learning – learn by doing • Solve real world challenges • Great supporting community • Benchmark solutions & shared code samples • Clear business objective and modeling task • Develop work portfolio and rank yourself against other competitors (and get recognition) • Compete against state of the art solutions • Learn (a lot!!!) when competition ends 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 12 Why start with Kaggle?
  • 13. • Ability to team-up with others:  learn from better Kagglers  learn how to collaborate effectively  merge different solutions to achieve a score boost  meet new exciting people • Answer the questions of others – you only truly learn something when you teach it to someone else • Ability to apply new ideas at work with little effort • Varied areas of activity (verticals) 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 13 Why start with Kaggle?
  • 14. • The ability to follow many experts where each of them specializes in a particular area (sample from my list) 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 14 Why start with Kaggle? Ensemble learning Mathias Müller Feature extraction Darius Barušauskas Validation Gert Jacobusse Super fast draft modeling ZFTurbo - unknown Inspiration – no minimal age for data science Mikel Bober-Irizar
  • 15. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 15 Common Kaggle Data Science process Data cleaning Data augmentation Adding External Data Single models Feature engineering Exploratory data analysis Single models Diverse single models Set the correct validation method Ensemble learning Final prediction EDA Feature generation modeling Ensemble learning Data cleaning & augmenting Not always allowed yet good practice to consider when possible 40%20% 30% 10% % of total time spent in each activity
  • 16. • Impute missing values (mean, median, most common value, use separate prediction task) • Remove zero variance features • Remove duplicated features • Outlier removal – caution can be harmful, at cleaning stage we’ll remove irrelevant values (e.g. negative price) • Na’s encoding / imputing 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 16 Data cleaning
  • 17. • External data sources:  open street map  weather measurement data  online calendars • API’s • Scraping (using ScraPy / beautiful soup) 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 17 Data augmentation & external data
  • 18. • Rescaling/ standardization of existing features • Performing data transformations: Tf-Idf, log1p, min-max scaling, binning of numeric features • Turn categorical features to numeric (label encoding / one hot encoding) • Create count features • Parsing textual features to get more generalizable features • Hashing trick • Extracting date/time features i.e DayOfWeek, month, year, dayOfMonth, isHoliday etc. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 18 Feature engineering
  • 19. • Remove near-zero-variance features • Use feature importance and eliminate least important features • Recursive Feature Elimination 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 19 Feature selection
  • 20. • Grid search CV (exhaustive, rarely better than alternatives) • Random search CV • Hyper-opt • Bayesian optimization * Hyper parameter adjustment will usually yield better results but not as much as other activities 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 20 Hyper parameter optimization
  • 21. • Train test split • Shuffle split • Kfold is the most commonly used • Time based separation • Group Kfold • Leave one group out 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 21 Validation
  • 22. • Simple/weighted average of previous best models • Bagging of same type of models (i.e different rng, different hyper-param) • Majority vote • Using out of fold predictions as meta features a.k.a stacking 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 22 Ensemble learning
  • 23. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 23 Out Of Fold predictions – a.k.a meta features fold1 fold2 fold3 fold4 oof 1 oof 2 oof 3 oof 4 Out of fold predictions Averaged test predictions Test predictions fold1 Test predictions fold2 Test predictions fold3 Test predictions fold4 Divided training data - train on 3 folds predict the forth fold and the testing data
  • 24. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 24 Out Of Fold predictions – a.k.a meta features fold1 fold2 fold3 fold4 oof 1 oof 2 oof 3 oof 4 Out of fold predictions Averaged test predictions Test predictions fold1 Test predictions fold2 Test predictions fold3 Test predictions fold4 Divided training data - train on 3 folds predict the forth fold and the testing data
  • 25. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 25 Out Of Fold predictions – a.k.a meta features oof 1 oof 2 oof 3 oof 4 Model 1 e.g. knn Averaged test predictions Out of fold predictions oof 1 oof 2 oof 3 oof 4 Model 2 e.g. NN oof 1 oof 2 oof 3 oof 4 Model 3 e.g. gbm Train labels Model 1 e.g. knn Model 2 e.g. NN Model 3 e.g. gbm After training several models using this method (3 different models in this sample) We can now train a new model using our newly formed meta features * Note that we can either train our meta model using only these new features or use the new features along with our original train data for training
  • 26. • Large focus on modeling relatively to the rest of the steps in the process • Small weight to runtime and scalability • Little reasoning for selecting a specific eval metric • Competing for the last few percent points isn’t always valuable • “Click and submit” phenomena 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 26 Disadvantages of Kaggle
  • 27. • MOOC’s:  Machine learning – Stanford Coursera  Data science track – Johns Hopkins Coursera  Udacity deep learning course • Documentation:  Scikit learn documentation  Keras documentation  R caret package documentation 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 27 Additional reading resources
  • 28. • This presentation draws heavily from the following sources: • Mark Peng’s presentation “Tips for participating Kaggle challenges” • Darius Barušauskas’s presentation “Tips and tricks to win Kaggle data science competitions” • Kaggle discussion forums and blog 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 28 Links to sources
  • 29. Questions? 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 29
  • 30. 6/25/2017 Starting Data Science with Kaggle.com Nathaniel Shimoni 30