SlideShare a Scribd company logo
Machine Learning Tom Maiaroto @shift8creative
What is Machine Learning?
Algorithms & Approaches Decision trees   Random forests   Artificial neural networks     k-NN (nearest neighbour)     Naive Bayesian classifier
Algorithms & Approaches Decision trees   Random forests   Artificial neural networks     k-NN (nearest neighbour)     Naive Bayesian classifier
So could machines one day rule the earth?
So could machines one day rule the earth?   Maybe    (ok probably not)
What can Machine Learning  do for Apps?   Spam filtering
What can Machine Learning  do for Apps? Auto-tagging
What can Machine Learning  do for Apps? All Sorts of Categorization
What can Machine Learning  do for Apps? Sentiment Analysis
Languages Commonly Used Java Java-ML, WEKA, Apache Mahout, many more... Python NLTK, scikit-learn, PyML, a good deal more...  C++ libDAI, Armadillo, Orange,  tons  more...     and then some others...
Languages Commonly Used     http://www.mloss.org
MongoDB Too! Map/Reduce
Stored JavaScript
Geo-spatial Indexing
Replication
Geo-spatial Indexing Did someone say nearest neighbour?
Geo-spatial Indexing Did someone say nearest neighbour? Design geeks, imagine the visualizations...
Replication Store massive amounts of data
Distributed performance benefits
Dedicated databases for calculations      All the obvious benefits.
Map/Reduce It's the brain.
Map/Reduce It's the brain. It's not just for aggregation.
Map/Reduce It's the brain. It's not just for aggregation.       It's faster than you might think.
Map/Reduce It's the brain. It's not just for aggregation.       It's faster than you might think. It runs  in   the database.
Map/Reduce In  the computer. ..
Example Time! It's simple...Just take this...
Example Time! It's simple...Just take this...
Example Time! Just kidding...       Let's Break Down a Naive Bayes Classifier
Classification /Naive Bayes Training the System
Classification /Naive Bayes Training the System Simple... $inc
Classification /Naive Bayes Just Keep Count of Words per Category
Classification /Naive Bayes Reduce:
Classification /Naive Bayes Reduce:
Classification /Naive Bayes Finalize:
Classification /Naive Bayes Finalize:
Classification /Naive Bayes Call the Command:
Classification /Naive Bayes Results: Can see total words.  Can also see word  counts per category.
Classification /Naive Bayes Results: ...and of course the scores per category... cae = arts and entertainment cs = science ...
Classification /Naive Bayes Accurate even with little training
MongoDB on a small VM Took 1.7 seconds
Compared to say  PHP   33 seconds and timed out
More training data == exponentially faster than PHP
Classification /Naive Bayes This wasn't even a full map/reduce

More Related Content

PPTX
Data Science Stack with MongoDB and RStudio
PPTX
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
PPTX
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
PPTX
Using the search engine as recommendation engine
PPTX
Analyzing Data With Python
PPT
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
PDF
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
PDF
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Data Science Stack with MongoDB and RStudio
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Pe...
Using the search engine as recommendation engine
Analyzing Data With Python
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...

What's hot (20)

PPTX
What’s New in the Berkeley Data Analytics Stack
PPTX
Seattle Scalability Mahout
PDF
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
PPTX
Hadoop with Python
PPTX
Data science and Hadoop
PPT
Orchestrating the Intelligent Web with Apache Mahout
KEY
Cascalog
PPTX
Solr 6.0 Graph Query Overview
PPTX
Hadoop for Data Science
PPTX
EDHREC @ Data Science MD
PPT
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
PPTX
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
PPTX
Python in big data world
PDF
An Introduction to Apache Hadoop, Mahout and HBase
PDF
Graphs, Graphs everywhere - Lucene powered relation exploration
PDF
Solr Graph Query: Presented by Kevin Watters, KMW Technology
PPTX
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
PDF
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
PDF
SDEC2011 Mahout - the what, the how and the why
PPTX
Data Science With Python | Python For Data Science | Python Data Science Cour...
What’s New in the Berkeley Data Analytics Stack
Seattle Scalability Mahout
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Hadoop with Python
Data science and Hadoop
Orchestrating the Intelligent Web with Apache Mahout
Cascalog
Solr 6.0 Graph Query Overview
Hadoop for Data Science
EDHREC @ Data Science MD
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Python in big data world
An Introduction to Apache Hadoop, Mahout and HBase
Graphs, Graphs everywhere - Lucene powered relation exploration
Solr Graph Query: Presented by Kevin Watters, KMW Technology
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
SDEC2011 Mahout - the what, the how and the why
Data Science With Python | Python For Data Science | Python Data Science Cour...
Ad

Similar to MongoDB & Machine Learning (20)

PPTX
Data oriented design and c++
PPT
Big Data - JAX2011 (Pavlo Baron)
PPTX
Exploring .NET memory management - JetBrains webinar
PDF
Improving The Performance of Your Web App
PDF
The Holistic Programmer
PPTX
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
PDF
Cloud Computing Bootcamp On The Google App Engine [v1.1]
PDF
Puppet for SysAdmins
PDF
How Adobe Does 2 Million Records Per Second Using Apache Spark!
ODP
The Art of Evolutionary Algorithms Programming
PPT
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
PDF
Big data and AI presentation slides
PDF
The computer science behind a modern disributed data store
PDF
The Computer Science Behind a modern Distributed Database
PPT
Apache Con 2008 Top 10 Mistakes
PPT
scale_perf_best_practices
PDF
Tips And Tricks For Bioinformatics Software Engineering
PDF
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
PDF
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
PPT
Top 10 Scalability Mistakes
Data oriented design and c++
Big Data - JAX2011 (Pavlo Baron)
Exploring .NET memory management - JetBrains webinar
Improving The Performance of Your Web App
The Holistic Programmer
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
Cloud Computing Bootcamp On The Google App Engine [v1.1]
Puppet for SysAdmins
How Adobe Does 2 Million Records Per Second Using Apache Spark!
The Art of Evolutionary Algorithms Programming
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Big data and AI presentation slides
The computer science behind a modern disributed data store
The Computer Science Behind a modern Distributed Database
Apache Con 2008 Top 10 Mistakes
scale_perf_best_practices
Tips And Tricks For Bioinformatics Software Engineering
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
Top 10 Scalability Mistakes
Ad

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Programs and apps: productivity, graphics, security and other tools
Spectroscopy.pptx food analysis technology
Digital-Transformation-Roadmap-for-Companies.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing

MongoDB & Machine Learning