SlideShare a Scribd company logo
Machine learning by example
Michał Matłoka @mmatloka
–Alan Turing (1950)
“Can machines think?”
Turing test
http://www.smbc-comics.com/?id=2999
• Make machines
“think” like humans
• Learn from data and
make predictions
What is ML?
• Supervised learning
◦ Classification
◦ Regression
• Unsupervised learning
◦ Clustering
◦ Dimensionality Reduction
• Semi-supervised learning
• Reinforcement learning
◦ E.g. AlphaGo
Learning
types
SUPERVISED - CLASSIFICATION
SUPERVISED - REGRESSION
UNSUPERVISED - CLUSTERING
UNSUPERVISED -
DIMENSIONALITY REDUCTION
SEMI SUPERVISED
REINFORCMENT LEARNING
ALPHAGO
https://cdn.arstechnica.net/wp-content/uploads/sites/3/2017/05/GettyImages-688097364-800x549.jpg
• Voice recognition
• Fraud analysis
• Face detection
• Ads click-through rate
prediction
• Spam detection
• Shop recommendations
• Photos description
• Self-driving cars
• Healthcare
• ...
Use cases
1. Data gathering
2. Data cleaning & feature
engineering
3. Dataset -> training & test set
4. Learning -> Model
5. Evaluation -> Accuracy
6. New observation -> Prediction
Learning
process
EXAMPLE
Classify conference
talk abstracts into
tracks
• RDD (Resilient Distributed Dataset) - map,
filter, count etc - DataFrame
• Spark SQL
• MLib
• GraphX
• Spark Streaming
• API: Scala, Java, Python, R*
Code.
https://github.com/mmatloka/machine-learning-by-example
• Bigger data set
• Smarter tokenizer
• Stemming &
lemmatization
• IDF - Inverse Document
Frequency
• K-fold Cross-Validation
• Parameters tuning (grid-
search)
What can be
improved?
“I will present Big Data
topics”
“I”, “will”, “present”, “Big”,
“Data”, “topics”
What can be
improved?
(Tokenizer)
“I will present Big Data
topics”
“I will”, “will present”,
“present Big”, “Big Data”,
“Data topics”
What can be
improved?
(n-grams)
STEMMING & LEMMATIZATION
“communities”, community”
“commun”
What can be
improved?
(Porter Stemmer)
“communities”, community”
“community”
What can be
improved?
(Lemmatization)
TF - Term Frequency
(HashingTF!) - number of
times term occurs in given
document
IDF - Inverse Document
Frequency - occurs many
times in documents set
What can be
improved?
(TF & IDF)
What can be
improved?
(K-fold Cross-Validation)
val	paramGrid	=	new	
ParamGridBuilder()	
		.addGrid(hashingTF.numFeatures,	
Array(256,	512,	1024,	2048,	4096))	
		.build()	
			
val	cv	=	new	CrossValidator()	
		.setEstimator(pipeline)	
		.setEvaluator(evaluator)	
		.setEstimatorParamMaps(paramGrid)	
		.setNumFolds(5)		
			
val	cvModel	=	cv.fit(trainData)
What can be
improved?
(Grid search)
Articles & references
• https://www.csee.umbc.edu/courses/471/papers/turing.pdf
• http://spark.apache.org/
• https://databricks.com/try-databricks
• https://research.googleblog.com/2016/09/introducing-
open-images-dataset.html
• https://arstechnica.co.uk/information-technology/2017/05/
deepmind-alphago-go-ke-jie-china/
• https://www.kaggle.com
• https://github.com/dylanmei/docker-zeppelin
• https://github.com/databricks/spark-corenlp
• https://www.coursera.org/learn/scala-spark-big-data
Thank you, Q&A?
@mmatloka
mmatloka
softwaremill.com/blog

More Related Content

PDF
Modern SQL in Open Source and Commercial Databases
PDF
Jednorożce to kobiety a nie firmy. O √kobiecym w STEM
PDF
Origins of free
PDF
Small intro to Big Data - Old version
PDF
3 kroki do sukcesu płaskiej i zdalnej firmy | SoftwareMill
PDF
Scalatra - Scalar Mini
PDF
An Introduction to Akka
PPTX
Open source big data landscape and possible ITS applications
Modern SQL in Open Source and Commercial Databases
Jednorożce to kobiety a nie firmy. O √kobiecym w STEM
Origins of free
Small intro to Big Data - Old version
3 kroki do sukcesu płaskiej i zdalnej firmy | SoftwareMill
Scalatra - Scalar Mini
An Introduction to Akka
Open source big data landscape and possible ITS applications

Viewers also liked (18)

PDF
Projekt z punktu widzenia UX designera
PPTX
Why You Should #BeBoldForChange For Women
PDF
1.3.22 Гибкие гофрированные трубы для дренажа
PDF
ITEMS International Review of ICANN At large - 2016 - 1017
PDF
EBOOK CHILE - TERRITORIO(S), GÉNERO, TRABAJO y POLÍTICAS PÚBLICAS EN AMÉRICA...
PDF
Nagoya.R #12 Rprofile作成のススメ
DOC
Gedetineerden blijven er lustig op los bellen
ODP
Fail over fail_back
PDF
PPTX
2 bhk Bedroom Builder Floor for Rent in Gurgaon
PDF
E2D3グラフの投稿ハンズオン
DOCX
Bilangan Peroksida dan Bilangan TBA
PPTX
#ChangeAgents, Experiments, & Expertise in Our Exponential Era - David Bray
PPTX
Faster Smarter Decision Cycles: The Key to Winning in the New Normal
PDF
InsideSales.com - The Best Practices For Lead Response Management
PDF
Accélérateurs - Valeurs Actuelles
PDF
Leona Chin Profile 2017
Projekt z punktu widzenia UX designera
Why You Should #BeBoldForChange For Women
1.3.22 Гибкие гофрированные трубы для дренажа
ITEMS International Review of ICANN At large - 2016 - 1017
EBOOK CHILE - TERRITORIO(S), GÉNERO, TRABAJO y POLÍTICAS PÚBLICAS EN AMÉRICA...
Nagoya.R #12 Rprofile作成のススメ
Gedetineerden blijven er lustig op los bellen
Fail over fail_back
2 bhk Bedroom Builder Floor for Rent in Gurgaon
E2D3グラフの投稿ハンズオン
Bilangan Peroksida dan Bilangan TBA
#ChangeAgents, Experiments, & Expertise in Our Exponential Era - David Bray
Faster Smarter Decision Cycles: The Key to Winning in the New Normal
InsideSales.com - The Best Practices For Lead Response Management
Accélérateurs - Valeurs Actuelles
Leona Chin Profile 2017
Ad

Similar to Machine learning by example (20)

PPTX
What is Machine Learning with AI AND MORE.pptx
PPTX
AI_Presentation (1). Artificial intelligence
PPTX
AI_and_ML_Presentation (1).pptvhghhhhjhhh
PPTX
Machine learning and azure ml studio gabc
PPTX
AI & ML: Transforming Technology, Innovation & Future Growth
PPT
Recent trends in Artificial intelligence and Machine learning
PDF
Machine Learning Basic in Computer Science.pdf
PPTX
This presentation contains the basics of machine learning
PPTX
Machine Learning, Types Of Machine Learning & Its Applications
PPTX
Fully_Enhanced_AI_Presentation.pptx for1
PPTX
Discussion of Machine_Learning Discussion of Machine_Learning
PPTX
Applications of artificial intelligence
PPTX
Introduction to Machine Learning theory .pptx
PPTX
Artificial Intelligent is more powerful than human because Ai can easity adap...
PPTX
Machine learning and azure ml studio
PPTX
Machine Learning Solutions | AI-Powered Insights & Automation
PDF
Machine Learning Essentials (dsth Meetup#3)
PPTX
Artificial_Intelligence_Presentation.pptx
PPTX
Machine_Learning_Discussion Machine_Learning_Discussion Machine_Learning_Disc...
PPTX
What is Artificial Intelligence - Beginners
What is Machine Learning with AI AND MORE.pptx
AI_Presentation (1). Artificial intelligence
AI_and_ML_Presentation (1).pptvhghhhhjhhh
Machine learning and azure ml studio gabc
AI & ML: Transforming Technology, Innovation & Future Growth
Recent trends in Artificial intelligence and Machine learning
Machine Learning Basic in Computer Science.pdf
This presentation contains the basics of machine learning
Machine Learning, Types Of Machine Learning & Its Applications
Fully_Enhanced_AI_Presentation.pptx for1
Discussion of Machine_Learning Discussion of Machine_Learning
Applications of artificial intelligence
Introduction to Machine Learning theory .pptx
Artificial Intelligent is more powerful than human because Ai can easity adap...
Machine learning and azure ml studio
Machine Learning Solutions | AI-Powered Insights & Automation
Machine Learning Essentials (dsth Meetup#3)
Artificial_Intelligence_Presentation.pptx
Machine_Learning_Discussion Machine_Learning_Discussion Machine_Learning_Disc...
What is Artificial Intelligence - Beginners
Ad

More from SoftwareMill (20)

PDF
Growing Oxen: channel operators and retries
PDF
How To Survive a Live-Coding Session
PDF
Goryle i ser szwajcarski. Czego medycyna ratunkowa może Cię nauczyć o tworzen...
PPTX
Have you ever wondered about code review?
PDF
Reactive Integration with Akka Streams and Alpakka
PDF
W świecie botów czyli po co nam SI
PDF
Small intro to Big Data
PDF
Out-of-the-box Reactive Streams with Java 9
PDF
Hiring, Bots and Beer. (Hiring in the IT industry)
PDF
Teal Is The New Black
PDF
Windowing data in big data streams
PDF
Kafka as a message queue
PDF
Introduction to Cassandra
PDF
Origins of Free
PDF
Cassandra - how to fail?
PDF
How to manage in a flat organized, remote and transparent company
PDF
Performance tests with gatling
PDF
What is most important 
in cooperation with external software developers? Par...
PDF
Proste REST API z użyciem play i slick
PDF
From spaghetti with no `src/test` to green CI and well-sleeping developers
Growing Oxen: channel operators and retries
How To Survive a Live-Coding Session
Goryle i ser szwajcarski. Czego medycyna ratunkowa może Cię nauczyć o tworzen...
Have you ever wondered about code review?
Reactive Integration with Akka Streams and Alpakka
W świecie botów czyli po co nam SI
Small intro to Big Data
Out-of-the-box Reactive Streams with Java 9
Hiring, Bots and Beer. (Hiring in the IT industry)
Teal Is The New Black
Windowing data in big data streams
Kafka as a message queue
Introduction to Cassandra
Origins of Free
Cassandra - how to fail?
How to manage in a flat organized, remote and transparent company
Performance tests with gatling
What is most important 
in cooperation with external software developers? Par...
Proste REST API z użyciem play i slick
From spaghetti with no `src/test` to green CI and well-sleeping developers

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx

Machine learning by example