DATA
DRIVEN
INNOVATION
Rome 2017 | Open Summit
MACHINE LEARNING
REAL LIFE APPLICATIONS
BY EXAMPLES
SPEAKER
MARIO CARTIA
MARIO@BIG-DATA.NINJA
DDI
R O M E| 2017
M A RI O C A RTI A
Can	machines
think?
Computing	machinery and	intelligence.	
Mind,	59,	433-460	(1950)
Turing A.M.
DDI
R O M E| 2017
M A RI O C A RTI A
1968
2001:	A	Space	Odyssey
“I'm sorry Dave, I'm afraid I can't do that”
DDI
R O M E| 2017
M A RI O C A RTI A
1982
Supercar
DDI
R O M E| 2017
M A RI O C A RTI A
1983
Wargames
DDI
R O M E| 2017
M A RI O C A RTI A
1996
Kasparov	vs.
Deep Blue
DDI
R O M E| 2017
M A RI O C A RTI A
Does Deep Blue use artificial intelligence?
The short answer is "no." Earlier computer designs that tried
to mimic human thinking weren't very good at it. No formula
exists for intuition. So Deep Blue's designers have gone
"back to the future." Deep Blue relies more on computational
power and a simpler search and evaluation function.
The long answer is "no." "Artificial Intelligence" is more
successful in science fiction than it is here on earth, and you
don't have to be Isaac Asimov to know why it's hard to
design a machine to mimic a process we don't understand
very well to begin with.
Source:
https://www.research.ibm.com/deepblue/meet/html/d.3.3a.shtml
DDI
R O M E| 2017
M A RI O C A RTI A
Decision Tree (IF...	THEN)
DDI
R O M E| 2017
M A RI O C A RTI A
“Machine learning is the
subfield of computer science
that gives computers the
ability to learn without being
explicitly programmed.”
Arthur Samuel, 1959
DDI
R O M E| 2017
M A RI O C A RTI A
Spam	Email	Filtering
DDI
R O M E| 2017
M A RI O C A RTI A
Email	Category Tabs
DDI
R O M E| 2017
M A RI O C A RTI A
“If you can't
explain it simply
you don't
understand
it well enough”
DDI
R O M E| 2017
M A RI O C A RTI A
SUPERVISED	LEARNING
Supervised learning is where you
have input variables (x) and an
output variable (y) and you use an
algorithm to learn the mapping
function from the input to the
output
y=f(x)
DDI
R O M E| 2017
M A RI O C A RTI A
SUPERVISED	LEARNING
Classification is a general process
related to categorization, the process
in which ideas and objects are
recognized, differentiated, and
understood
A classification system is an approach
to accomplishing classification
DDI
R O M E| 2017
M A RI O C A RTI A
CLASSIFICATION
In Machine Learning, Naive
Bayes Classifiers are a family of
simple probabilistic classifiers
based on applying Bayes'
theorem with strong (naive)
independence assumptions
between the features
DDI
R O M E| 2017
M A RI O C A RTI A
NAIVE	BAYES	CLASSIFIERS
Naive Bayes has been studied
extensively since the 1950s and
remains a popular (baseline) method for
text categorization, the problem of
judging documents as belonging to one
category or the other (such as spam or
legitimate, sports or politics, etc.) with
word frequencies as the features
DDI
R O M E| 2017
M A RI O C A RTI A
TEXT CATEGORIZATION
SUPERVISED LEARNING
CLASSIFICATION
NAIVE BAYES CLASSIFIER
?
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
DDI
R O M E| 2017
M A RI O C A RTI A
Recommendation system
DDI
R O M E| 2017
M A RI O C A RTI A
UNSUPERVISED	LEARNING
Unsupervised learning algorithms
are machine learning algorithms
that work without a desired output
label
Essentially, the algorithm attempts
to estimate the underlying structure
of the population of input data
DDI
R O M E| 2017
M A RI O C A RTI A
UNSUPERVISED	LEARNING
Collaborative Filtering is a method of making
automatic predictions (filtering) about the
interests of a user by collecting preferences or
taste information from many users
(collaborating)
In the more general sense, Collaborative
Filtering is the process of filtering for
information or patterns using techniques
involving collaboration among multiple agents,
viewpoints, data sources, etc.
DDI
R O M E| 2017
M A RI O C A RTI A
COLLABORATIVE	FILTERING
Applications of Collaborative Filtering typically
involve very large data sets
As the numbers of users and items grow,
traditional CF algorithms will suffer serious
scalability problems
Large web companies use clusters of
machines to scale recommendations for their
millions of users
DDI
R O M E| 2017
M A RI O C A RTI A
RECOMMENDATION SYSTEM
UNSUPERVISED LEARNING
COLLABORATIVE FILTERING
USER BASED / ITEM BASED / OTHER
?
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
Targeted Advertising
DDI
R O M E| 2017
M A RI O C A RTI A
Targeted Advertising
DDI
R O M E| 2017
M A RI O C A RTI A
UNSUPERVISED	LEARNING
Cluster analysis or Clustering is the
task of grouping a set of objects in
such a way that objects in the same
group (called a cluster) are more
similar (in some sense or another)
to each other than to those in other
groups (clusters)
DDI
R O M E| 2017
M A RI O C A RTI A
CLUSTERING
K-means clustering is a method of vector
quantization, originally from signal
processing, that is popular for cluster
analysis in data mining
K-means clustering aims to partition n
observations into k clusters in which each
observation belongs to the cluster with
the nearest mean
DDI
R O M E| 2017
M A RI O C A RTI A
TARGETED ADVERTISING
UNSUPERVISED LEARNING
CLUSTERING
K-MEANS CLUSTERING
?
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
TYPICAL	ML	WORKFLOW
ü Data and problem definition
ü Data collection
ü Data preprocessing
ü Data analysis and modeling with
unsupervised and supervised
learning
ü Process evaluation
DDI
R O M E| 2017
M A RI O C A RTI A
EVALUATION	METRICS
The root-mean-square deviation
(RMSD) or root-mean-square error
(RMSE) is a frequently used
measure of the differences between
values (sample and population
values) predicted by a model or an
estimator and the values actually
observed
DDI
R O M E| 2017
M A RI O C A RTI A
BEYOND	ML
Deep learning is a branch of
machine learning based on a set of
algorithms that attempt to model
high level abstractions in data
Deep learning is part of a broader
family of machine learning methods
based on learning representations
DDI
R O M E| 2017
M A RI O C A RTI A
BEYOND	ML
One of the promises of Deep Learning
is replacing handcrafted features with
efficient algorithms for unsupervised
or semi-supervised feature learning
and hierarchical feature extraction
Some of the representations are
inspired by advances in neuroscience
DDI
R O M E| 2017
M A RI O C A RTI A
BEYOND	ML
Various Deep Learning architectures
such as deep neural networks have
been applied to fields like computer
vision, automatic speech recognition,
natural language processing, audio
recognition and bioinformatics where
they have been shown to produce state-
of-the-art results on various tasks
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
DDI
R O M E| 2017
M A RI O C A RTI A
ML	&	BIG	DATA
“We don’t have better algorithms.
We just have more data.”
Peter Norvig
Google’s Research Director
DDI
R O M E| 2017
M A RI O C A RTI A
ML	&	BIG	DATA
Apache Hadoop is an open-source
software framework used for distributed
storage and processing of big data sets
using clusters built from commodity
hardware
DDI
R O M E| 2017
M A RI O C A RTI A
ML	&	BIG	DATA
Apache Spark is a fast and general-purpose
cluster computing system
It provides high-level APIs in Scala, Java, Python
and R, and an optimized engine that supports
general execution graphs
It also supports a rich set of higher-level tools
including Spark SQL for SQL and structured data
processing, MLlib for machine learning, GraphX
for graph processing, and Spark Streaming
DDI
R O M E| 2017
M A RI O C A RTI A
ML	&	BIG	DATA
DDI
R O M E| 2017
M A RI O C A RTI A
WHY	TO	USE	SCALA?
Spark Survey 2016
DDI
R O M E| 2017
M A RI O C A RTI A
WHY	TO	USE	SCALA?
Scala is one of the most exciting languages to be created in the 21st
century. It is a multi-paradigm language that fully supports functional,
object-oriented, imperative and concurrent programming. It also has a
strong type system, and from our point of view, strong type is a
convenient form of self-documenting code.
Scala works on the JVM and has access to the riches of the Java
ecosystem, but it is less verbose than Java. As we employ it for ND4J,
its syntax is strikingly similar to Python, a language that many data
scientists are comfortable with. Like Python, Scala makes
programmers happy, but like Java, it is quite fast.
Finally, Apache Spark is written in Scala, and any library that purports
to work on distributed run times should at the very least be able to
interface with Spark
Source: https://deeplearning4j.org/scala
DDI
R O M E| 2017
M A RI O C A RTI A
GRAZIE!
MARIO@BIG-DATA.NINJA

More Related Content

PDF
Machine Learning Real Life Applications By Examples
PDF
A Blended Approach to Analytics at Data Tactics Corporation
PPTX
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
PPTX
Diventare famosi con lo stack ELK - Alfonso Iannotta
PDF
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
PDF
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
PPTX
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
PDF
The Pandora Security Model - Alessandro Confetti
Machine Learning Real Life Applications By Examples
A Blended Approach to Analytics at Data Tactics Corporation
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
Diventare famosi con lo stack ELK - Alfonso Iannotta
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
The Pandora Security Model - Alessandro Confetti

Viewers also liked (17)

PDF
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...
PDF
The mine of the public open data, a fundamental asset - Flavia Marzano
PDF
Towards intelligent data insights in central banks: challenges and opportunit...
PDF
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
PDF
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
PDF
How Data Drive Beyond Bank - Christian Miccoli (Conio)
PPTX
Democratizing Machine Learning with the Power of Cloud
PPTX
AWS 2013 LA Media Event: Scalable Media Processing
PDF
AWS re:Invent - Med305 Achieving consistently high throughput for very large ...
PPTX
Introduzione
PDF
몰디브 허니문 안내 & 제안서
PPTX
제3회 오픈 로보틱스 세미나 1일차 1세션 안드로이드 App 통신
PPTX
How to Improve UX by Being Data Driven
PDF
Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015
PDF
130423 nano
PPT
Build FAST with parallel_calabash
PPTX
Data driven UX at World Usability Congress 2016 - Graz, Austria
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...
The mine of the public open data, a fundamental asset - Flavia Marzano
Towards intelligent data insights in central banks: challenges and opportunit...
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
How Data Drive Beyond Bank - Christian Miccoli (Conio)
Democratizing Machine Learning with the Power of Cloud
AWS 2013 LA Media Event: Scalable Media Processing
AWS re:Invent - Med305 Achieving consistently high throughput for very large ...
Introduzione
몰디브 허니문 안내 & 제안서
제3회 오픈 로보틱스 세미나 1일차 1세션 안드로이드 App 통신
How to Improve UX by Being Data Driven
Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015
130423 nano
Build FAST with parallel_calabash
Data driven UX at World Usability Congress 2016 - Graz, Austria
Ad

Similar to Machine Learning Real Life Applications By Examples - Mario Cartia (20)

PPTX
Screening of Mental Health in Adolescents using ML.pptx
PPTX
Recommendation system using collaborative deep learning
PDF
The role of data engineering in data science and analytics practice
PDF
Big Data Conference
PDF
Text Mining Using R
PDF
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
PDF
From Data Visualization to Role-Specific Data Discovery
PPTX
Linear Regression with R programming.pptx
PPTX
Data Science Roadmap by Swapnil Microsoft
PPTX
D sppt
PPTX
Data-centric AI and the convergence of data and model engineering: opportunit...
PPTX
Data analysis
PPTX
The exciting new world of code & data
PDF
Automating Software Development Using Artificial Intelligence (AI)
PDF
Application and Methods of Deep Learning in IoT
PDF
SmartData Slides: Machine Learning - From Discovery to Understanding
PPTX
Data Viz for Data Discovery
PPTX
Machine learning
PPTX
ML for DS.pptx
PDF
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Screening of Mental Health in Adolescents using ML.pptx
Recommendation system using collaborative deep learning
The role of data engineering in data science and analytics practice
Big Data Conference
Text Mining Using R
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
From Data Visualization to Role-Specific Data Discovery
Linear Regression with R programming.pptx
Data Science Roadmap by Swapnil Microsoft
D sppt
Data-centric AI and the convergence of data and model engineering: opportunit...
Data analysis
The exciting new world of code & data
Automating Software Development Using Artificial Intelligence (AI)
Application and Methods of Deep Learning in IoT
SmartData Slides: Machine Learning - From Discovery to Understanding
Data Viz for Data Discovery
Machine learning
ML for DS.pptx
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Ad

More from Data Driven Innovation (20)

PDF
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
PDF
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
PDF
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
PDF
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
PDF
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
PDF
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
PDF
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
PDF
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
PDF
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
PDF
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
PPTX
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
PDF
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
PDF
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
PDF
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
PDF
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
PDF
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
PDF
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
PDF
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
PDF
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
PDF
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...

Recently uploaded (20)

PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Navigating the Thai Supplements Landscape.pdf
PDF
Global Data and Analytics Market Outlook Report
PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
Best Data Science Professional Certificates in the USA | IABAC
PPTX
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
PPTX
Machine Learning and working of machine Learning
PPTX
New ISO 27001_2022 standard and the changes
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
eGramSWARAJ-PPT Training Module for beginners
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
ahaaaa shbzjs yaiw jsvssv bdjsjss shsusus s
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Navigating the Thai Supplements Landscape.pdf
Global Data and Analytics Market Outlook Report
DU, AIS, Big Data and Data Analytics.ppt
Best Data Science Professional Certificates in the USA | IABAC
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
Machine Learning and working of machine Learning
New ISO 27001_2022 standard and the changes
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
CYBER SECURITY the Next Warefare Tactics
eGramSWARAJ-PPT Training Module for beginners
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
ahaaaa shbzjs yaiw jsvssv bdjsjss shsusus s
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf

Machine Learning Real Life Applications By Examples - Mario Cartia

  • 1. DATA DRIVEN INNOVATION Rome 2017 | Open Summit MACHINE LEARNING REAL LIFE APPLICATIONS BY EXAMPLES SPEAKER MARIO CARTIA MARIO@BIG-DATA.NINJA
  • 2. DDI R O M E| 2017 M A RI O C A RTI A Can machines think? Computing machinery and intelligence. Mind, 59, 433-460 (1950) Turing A.M.
  • 3. DDI R O M E| 2017 M A RI O C A RTI A 1968 2001: A Space Odyssey “I'm sorry Dave, I'm afraid I can't do that”
  • 4. DDI R O M E| 2017 M A RI O C A RTI A 1982 Supercar
  • 5. DDI R O M E| 2017 M A RI O C A RTI A 1983 Wargames
  • 6. DDI R O M E| 2017 M A RI O C A RTI A 1996 Kasparov vs. Deep Blue
  • 7. DDI R O M E| 2017 M A RI O C A RTI A Does Deep Blue use artificial intelligence? The short answer is "no." Earlier computer designs that tried to mimic human thinking weren't very good at it. No formula exists for intuition. So Deep Blue's designers have gone "back to the future." Deep Blue relies more on computational power and a simpler search and evaluation function. The long answer is "no." "Artificial Intelligence" is more successful in science fiction than it is here on earth, and you don't have to be Isaac Asimov to know why it's hard to design a machine to mimic a process we don't understand very well to begin with. Source: https://www.research.ibm.com/deepblue/meet/html/d.3.3a.shtml
  • 8. DDI R O M E| 2017 M A RI O C A RTI A Decision Tree (IF... THEN)
  • 9. DDI R O M E| 2017 M A RI O C A RTI A “Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed.” Arthur Samuel, 1959
  • 10. DDI R O M E| 2017 M A RI O C A RTI A Spam Email Filtering
  • 11. DDI R O M E| 2017 M A RI O C A RTI A Email Category Tabs
  • 12. DDI R O M E| 2017 M A RI O C A RTI A “If you can't explain it simply you don't understand it well enough”
  • 13. DDI R O M E| 2017 M A RI O C A RTI A SUPERVISED LEARNING Supervised learning is where you have input variables (x) and an output variable (y) and you use an algorithm to learn the mapping function from the input to the output y=f(x)
  • 14. DDI R O M E| 2017 M A RI O C A RTI A SUPERVISED LEARNING Classification is a general process related to categorization, the process in which ideas and objects are recognized, differentiated, and understood A classification system is an approach to accomplishing classification
  • 15. DDI R O M E| 2017 M A RI O C A RTI A CLASSIFICATION In Machine Learning, Naive Bayes Classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features
  • 16. DDI R O M E| 2017 M A RI O C A RTI A NAIVE BAYES CLASSIFIERS Naive Bayes has been studied extensively since the 1950s and remains a popular (baseline) method for text categorization, the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features
  • 17. DDI R O M E| 2017 M A RI O C A RTI A TEXT CATEGORIZATION SUPERVISED LEARNING CLASSIFICATION NAIVE BAYES CLASSIFIER ?
  • 18. DDI R O M E| 2017 M A RI O C A RTI A
  • 19. DDI R O M E| 2017 M A RI O C A RTI A
  • 20. DDI R O M E| 2017 M A RI O C A RTI A
  • 21. DDI R O M E| 2017 M A RI O C A RTI A
  • 22. DDI R O M E| 2017 M A RI O C A RTI A
  • 23. DDI R O M E| 2017 M A RI O C A RTI A Recommendation system
  • 24. DDI R O M E| 2017 M A RI O C A RTI A Recommendation system
  • 25. DDI R O M E| 2017 M A RI O C A RTI A Recommendation system
  • 26. DDI R O M E| 2017 M A RI O C A RTI A Recommendation system
  • 27. DDI R O M E| 2017 M A RI O C A RTI A Recommendation system
  • 28. DDI R O M E| 2017 M A RI O C A RTI A UNSUPERVISED LEARNING Unsupervised learning algorithms are machine learning algorithms that work without a desired output label Essentially, the algorithm attempts to estimate the underlying structure of the population of input data
  • 29. DDI R O M E| 2017 M A RI O C A RTI A UNSUPERVISED LEARNING Collaborative Filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating) In the more general sense, Collaborative Filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.
  • 30. DDI R O M E| 2017 M A RI O C A RTI A COLLABORATIVE FILTERING Applications of Collaborative Filtering typically involve very large data sets As the numbers of users and items grow, traditional CF algorithms will suffer serious scalability problems Large web companies use clusters of machines to scale recommendations for their millions of users
  • 31. DDI R O M E| 2017 M A RI O C A RTI A RECOMMENDATION SYSTEM UNSUPERVISED LEARNING COLLABORATIVE FILTERING USER BASED / ITEM BASED / OTHER ?
  • 32. DDI R O M E| 2017 M A RI O C A RTI A
  • 33. DDI R O M E| 2017 M A RI O C A RTI A
  • 34. DDI R O M E| 2017 M A RI O C A RTI A
  • 35. DDI R O M E| 2017 M A RI O C A RTI A
  • 36. DDI R O M E| 2017 M A RI O C A RTI A Targeted Advertising
  • 37. DDI R O M E| 2017 M A RI O C A RTI A Targeted Advertising
  • 38. DDI R O M E| 2017 M A RI O C A RTI A UNSUPERVISED LEARNING Cluster analysis or Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters)
  • 39. DDI R O M E| 2017 M A RI O C A RTI A CLUSTERING K-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean
  • 40. DDI R O M E| 2017 M A RI O C A RTI A TARGETED ADVERTISING UNSUPERVISED LEARNING CLUSTERING K-MEANS CLUSTERING ?
  • 41. DDI R O M E| 2017 M A RI O C A RTI A
  • 42. DDI R O M E| 2017 M A RI O C A RTI A
  • 43. DDI R O M E| 2017 M A RI O C A RTI A TYPICAL ML WORKFLOW ü Data and problem definition ü Data collection ü Data preprocessing ü Data analysis and modeling with unsupervised and supervised learning ü Process evaluation
  • 44. DDI R O M E| 2017 M A RI O C A RTI A EVALUATION METRICS The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed
  • 45. DDI R O M E| 2017 M A RI O C A RTI A BEYOND ML Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data Deep learning is part of a broader family of machine learning methods based on learning representations
  • 46. DDI R O M E| 2017 M A RI O C A RTI A BEYOND ML One of the promises of Deep Learning is replacing handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction Some of the representations are inspired by advances in neuroscience
  • 47. DDI R O M E| 2017 M A RI O C A RTI A BEYOND ML Various Deep Learning architectures such as deep neural networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state- of-the-art results on various tasks
  • 48. DDI R O M E| 2017 M A RI O C A RTI A
  • 49. DDI R O M E| 2017 M A RI O C A RTI A
  • 50. DDI R O M E| 2017 M A RI O C A RTI A ML & BIG DATA “We don’t have better algorithms. We just have more data.” Peter Norvig Google’s Research Director
  • 51. DDI R O M E| 2017 M A RI O C A RTI A ML & BIG DATA Apache Hadoop is an open-source software framework used for distributed storage and processing of big data sets using clusters built from commodity hardware
  • 52. DDI R O M E| 2017 M A RI O C A RTI A ML & BIG DATA Apache Spark is a fast and general-purpose cluster computing system It provides high-level APIs in Scala, Java, Python and R, and an optimized engine that supports general execution graphs It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming
  • 53. DDI R O M E| 2017 M A RI O C A RTI A ML & BIG DATA
  • 54. DDI R O M E| 2017 M A RI O C A RTI A WHY TO USE SCALA? Spark Survey 2016
  • 55. DDI R O M E| 2017 M A RI O C A RTI A WHY TO USE SCALA? Scala is one of the most exciting languages to be created in the 21st century. It is a multi-paradigm language that fully supports functional, object-oriented, imperative and concurrent programming. It also has a strong type system, and from our point of view, strong type is a convenient form of self-documenting code. Scala works on the JVM and has access to the riches of the Java ecosystem, but it is less verbose than Java. As we employ it for ND4J, its syntax is strikingly similar to Python, a language that many data scientists are comfortable with. Like Python, Scala makes programmers happy, but like Java, it is quite fast. Finally, Apache Spark is written in Scala, and any library that purports to work on distributed run times should at the very least be able to interface with Spark Source: https://deeplearning4j.org/scala
  • 56. DDI R O M E| 2017 M A RI O C A RTI A GRAZIE! MARIO@BIG-DATA.NINJA