MaM Machine Learning
Study group
Speaker Session - 17/04/2019
https://www.meetup.com/MaM-Machine-Learning-Study-Group/
Thanks to our host and sponsor
Imperial College Data Science Institute
Recworks Meet a Mentor community
https://recworks.co.uk/ https://meetamentor.co.uk
Agenda
Introduction to data analysis and data cleaning
By Mark Bell
Study group: the story so far
By Dave Snowdon and Jeremie Charlet
Study group
The story so far
Jeremie Charlet & Dave Snowdon
Agenda
How we work
What we’ve learnt
House prices project
IMDB Reviews project
Next steps
How we work
The power of the mob
Slack for group coordination
During a study session
● One person “drives”
● Everyone else contributes suggestions
● New “driver” each session
● Everyone literally on the same page
● Allows people to get up to speed without being put on the spot
Kaggle House prices
Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,La
ndSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,Y
earRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCon
d,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF
2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,Gr
LivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRm
sAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,Garage
Area,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,Scre
enPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePric
e
1,60,RL,65,8450,Pave,NA,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2003,2003,Gab
le,CompShg,VinylSd,VinylSd,BrkFace,196,Gd,TA,PConc,Gd,TA,No,GLQ,706,Unf,0,150,856,GasA,Ex,Y,S
Brkr,856,854,0,1710,1,0,2,1,3,1,Gd,8,Typ,0,NA,Attchd,2003,RFn,2,548,TA,TA,Y,0,61,0,0,0,0,NA,NA,NA,0
,2,2008,WD,Normal,208500
2,20,RL,80,9600,Pave,NA,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,1Story,6,8,1976,1976,Gabl
e,CompShg,MetalSd,MetalSd,None,0,TA,TA,CBlock,Gd,TA,Gd,ALQ,978,Unf,0,284,1262,GasA,Ex,Y,SBrk
r,1262,0,0,1262,0,1,2,0,3,1,TA,6,Typ,1,TA,Attchd,1976,RFn,2,460,TA,TA,Y,298,0,0,0,0,0,NA,NA,NA,0,5,2
007,WD,Normal,181500
Kaggle house prices
● Given a data set with information about house & area predict house price
● Lots of fumbling around in the dark
● Gradually got to grips with scikit learn and pandas
● From machinelearningmastery.com learnt to use a methodology
● Tried with and without outliers
● Tried various models: linear regression, random forest, simple NN
● Experimented with mapping categorical values
● Tried using RFE and random forest to find most important fields
Methodology
5-Step Systematic Process
1. Define the Problem
2. Prepare Data
3. Spot Check Algorithms
4. Improve Results
5. Present Results
https://machinelearningmastery.com/process-for-working-through-machine-learning-problems/
99% accuracy!
Pride comes before a fall
It’s harder than it looks!
● Keep your data & labels separate!
○ “How to Prevent Catastrophic Failure in Production ML Systems, Martin Goodson”, QCon
London 2019
● Need a methodology - or can get overwhelmed by all the possibilities
● If it looks too good to be true...
callingbullshit.org
IMDB Movie Review
sentiment analysis
Learnings - IMDB Reviews project
Problem: Given (long) movie review as plain text
Decide: is review positive or negative?
'This is one of the silliest movies I have ever had the misfortune to watch! I should have expected it, after seeing the first two, but I keep
getting suckered into these types of movies with the idea of "Maybe they did it right this time". Nope - not even close. Where do I
begin? How about with the special effects... To give you an idea of what passes for SFX in this movie, at one point a soldier is shooting
at a "Raptor" as it runs down a hallway. Even with less than a second of screen time, the viewer can easily see that it is just a man with
a tail apparently taped to him running around. Bad bad bad bad. How about the acting? If that's what you can call it. There is one
character who, I suppose, is supposed to be from the south. However, after living in the south for six years now, I have never heard this
way of talking. Perhaps he has some sort of weird disability - the inability to talk normally. I find it fascinating that the character does
nothing that requires him to have that accent - therefore there was no reason for the actor to try to do one. How about the plot? It’s
pretty basic - Raptors escape, people with guns must hunt them down. I’m starting to wonder why the dinosaurs in these movies
always seem to run into the nearest system of tunnels... wouldn’t they stay outside to hunt prey? ...
Learnings - IMDB Reviews project
processing text data
vectorization techniques
The dog is on the table
Learnings - IMDB Reviews project
processing text data
vectorization techniques
[ 0, 1, 1, 0, 0, 0, 2, … ]
[Cat dog table monkey movie man the … ]
[12, 908, 35, 45, 12, 13]
12 The
...
908 Dog
Word embedding vectors
Learnings - IMDB Reviews project
Learnings - IMDB Reviews project
Read research papers to create models
Learn/practice with both scikit learn for ML and Keras for DL
Discovered different neural network architectures: CNN, LSTM
Learnings - IMDB Reviews project
compared multiple ML and DL architectures
Next steps
Next steps
Start new project
Mob v2
● Pomodoros
● Mob + pair work
● With homework (reading articles or research papers)
Follow our methodology from start to finish
Join us: https://www.meetup.com/MaM-Machine-Learning-Study-Group/
Thank you! Any question ?

More Related Content

PPTX
Do we know our data, as good as we know our tools
PPTX
TNA taxonomies 20160525
PPT
Tna Discovery Portal
PPTX
TNA Portail Discovery
PPT
Tna how taxonomy applications were built
PPT
TNA Introduction to taxonomy applications
PPT
Introduction to Shell Scripting
PPT
Actors with akka
Do we know our data, as good as we know our tools
TNA taxonomies 20160525
Tna Discovery Portal
TNA Portail Discovery
Tna how taxonomy applications were built
TNA Introduction to taxonomy applications
Introduction to Shell Scripting
Actors with akka

Recently uploaded (20)

PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
STKI Israel Market Study 2025 version august
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPT
What is a Computer? Input Devices /output devices
PPTX
The various Industrial Revolutions .pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Five Habits of High-Impact Board Members
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
Architecture types and enterprise applications.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
2018-HIPAA-Renewal-Training for executives
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
Final SEM Unit 1 for mit wpu at pune .pptx
STKI Israel Market Study 2025 version august
Taming the Chaos: How to Turn Unstructured Data into Decisions
What is a Computer? Input Devices /output devices
The various Industrial Revolutions .pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
sustainability-14-14877-v2.pddhzftheheeeee
Benefits of Physical activity for teenagers.pptx
Five Habits of High-Impact Board Members
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Improvisation in detection of pomegranate leaf disease using transfer learni...
Build Your First AI Agent with UiPath.pptx
Architecture types and enterprise applications.pdf
CloudStack 4.21: First Look Webinar slides
Comparative analysis of machine learning models for fake news detection in so...
2018-HIPAA-Renewal-Training for executives
Custom Battery Pack Design Considerations for Performance and Safety
Enhancing plagiarism detection using data pre-processing and machine learning...
NewMind AI Weekly Chronicles – August ’25 Week III
Ad
Ad

Machine learning study group 17 4 2019

Editor's Notes

  • #18: Dealing with stop words, punctuation Then applying a vectorization technique to transform a sentence in a numeric / vectorized representation
  • #19: Dealing with stop words, punctuation Then applying a vectorization technique to transform a sentence in a numeric / vectorized representation
  • #21: Dealing with stop words, punctuation Then applying a vectorization technique to transform a sentence in a numeric / vectorized representation
  • #22: We started giving ourselves homework: read research paper before coming, and then spend a session reading through the researcher’s code, then rewriting it. We chose Keras as recommended by our datascientist mentors If needed read a few articles on CNN to understand how it work
  • #25: Talk about the reasons for having mob v2