SlideShare a Scribd company logo
Introduction to 
Machine Learning 
September 2014 Meetup 
Rahul Jain 
@rahuldausa 
Join us @ For Solr, Lucene, Elasticsearch, Machine Learning, IR 
http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 
http://www.meetup.com/DataAnalyticsGroup/ 
Join us @ For Hadoop, Spark, Cascading, Scala, NoSQL, Crawlers and all cutting edge technologies. 
http://www.meetup.com/Hyderabad-Programming-Geeks-Group/
Agenda 
• Introduction 
• Basics 
• Classification 
• Clustering 
• Regression 
• Use-Cases 
2
Quick Questionnaire 
How many people have heard about Machine Learning 
How many people know about Machine Learning 
How many people are using Machine Learning
About 
• subfield of Artificial Intelligence (AI) 
• name is derived from the concept that it deals with 
“construction and study of systems that can learn from data” 
• can be seen as building blocks to make computers learn to 
behave more intelligently 
• It is a theoretical concept. There are various techniques with 
various implementations. 
• http://en.wikipedia.org/wiki/Machine_learning
In other words… 
“A computer program is said to learn from 
experience (E) with some class of tasks (T) and a 
performance measure (P) if its performance at tasks 
in T as measured by P improves with E”
Terminology 
• Features 
– The number of features or distinct traits that can be used to describe 
each item in a quantitative manner. 
• Samples 
– A sample is an item to process (e.g. classify). It can be a document, a 
picture, a sound, a video, a row in database or CSV file, or whatever 
you can describe with a fixed set of quantitative traits. 
• Feature vector 
– is an n-dimensional vector of numerical features that represent some 
object. 
• Feature extraction 
– Preparation of feature vector 
– transforms the data in the high-dimensional space to a space of 
fewer dimensions. 
• Training/Evolution set 
– Set of data to discover potentially predictive relationships.
Let’s dig deep into it… 
What do you mean by 
Apple
Learning (Training) 
Features: 
1. Color: Radish/Red 
2. Type : Fruit 
3. Shape 
etc… 
Features: 
1. Sky Blue 
2. Logo 
3. Shape 
etc… 
Features: 
1. Yellow 
2. Fruit 
3. Shape 
etc…
Workflow
Categories 
• Supervised Learning 
• Unsupervised Learning 
• Semi-Supervised Learning 
• Reinforcement Learning
Supervised Learning 
• the correct classes of the training data are 
known 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Unsupervised Learning 
• the correct classes of the training data are not 
known 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Semi-Supervised Learning 
• A Mix of Supervised and Unsupervised learning 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Reinforcement Learning 
• allows the machine or software agent to learn its 
behavior based on feedback from the environment. 
• This behavior can be learnt once and for all, or keep on 
adapting as time goes by. 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Machine Learning Techniques
Techniques 
• classification: predict class from observations 
• clustering: group observations into 
“meaningful” groups 
• regression (prediction): predict value from 
observations
Classification 
• classify a document into a predefined category. 
• documents can be text, images 
• Popular one is Naive Bayes Classifier. 
• Steps: 
– Step1 : Train the program (Building a Model) using a 
training set with a category for e.g. sports, cricket, news, 
– Classifier will compute probability for each word, the 
probability that it makes a document belong to each of 
considered categories 
– Step2 : Test with a test data set against this Model 
• http://en.wikipedia.org/wiki/Naive_Bayes_classifier
Clustering 
• clustering is the task of grouping a set of objects in 
such a way that objects in the same group (called 
a cluster) are more similar to each other 
• objects are not predefined 
• For e.g. these keywords 
– “man’s shoe” 
– “women’s shoe” 
– “women’s t-shirt” 
– “man’s t-shirt” 
– can be cluster into 2 categories “shoe” and “t-shirt” or 
“man” and “women” 
• Popular ones are K-means clustering and Hierarchical 
clustering
K-means Clustering 
• partition n observations into k clusters in which each observation belongs 
to the cluster with the nearest mean, serving as a prototype of the cluster. 
• http://en.wikipedia.org/wiki/K-means_clustering 
http://pypr.sourceforge.net/kmeans.html
Hierarchical clustering 
• method of cluster analysis which seeks to build 
a hierarchy of clusters. 
• There can be two strategies 
– Agglomerative: 
• This is a "bottom up" approach: each observation starts in its own 
cluster, and pairs of clusters are merged as one moves up the 
hierarchy. 
• Time complexity is O(n^3) 
– Divisive: 
• This is a "top down" approach: all observations start in one cluster, 
and splits are performed recursively as one moves down the 
hierarchy. 
• Time complexity is O(2^n) 
• http://en.wikipedia.org/wiki/Hierarchical_clustering
Regression 
• is a measure of the relation between 
the mean value of one variable (e.g. 
output) and corresponding values of 
other variables (e.g. time and cost). 
• regression analysis is a statistical 
process for estimating the 
relationships among variables. 
• Regression means to predict the 
output value using training data. 
• Popular one is Logistic regression 
(binary regression) 
• http://en.wikipedia.org/wiki/Logistic_regression
Classification vs Regression 
• Classification means to 
group the output into 
a class. 
• classification to predict 
the type of tumor i.e. 
harmful or not harmful 
using training data 
• if it is 
discrete/categorical 
variable, then it is 
classification problem 
• Regression means to 
predict the output 
value using training 
data. 
• regression to predict 
the house price from 
training data 
• if it is a real 
number/continuous, 
then it is regression 
problem.
Let’s see the usage in Real life
Use-Cases 
• Spam Email Detection 
• Machine Translation (Language Translation) 
• Image Search (Similarity) 
• Clustering (KMeans) : Amazon 
Recommendations 
• Classification : Google News 
continued…
Use-Cases (contd.) 
• Text Summarization - Google News 
• Rating a Review/Comment: Yelp 
• Fraud detection : Credit card Providers 
• Decision Making : e.g. Bank/Insurance sector 
• Sentiment Analysis 
• Speech Understanding – iPhone with Siri 
• Face Detection – Facebook’s Photo tagging
Classification in Action 
isn’t it easy?
it’s not (Snapshot of Spam folder) 
Not a 
Spam 
Not a 
Spam
NER (Named Entity Recognition) 
http://nlp.stanford.edu:8080/ner/process
Similar/Duplicate Images 
Remember 
Features ? 
(Feature Extraction) 
Can be : 
• Width 
• Height 
• Contrast 
• Brightness 
• Position 
• Hue 
• Colors 
Credit: https://www.google.co.in/ 
Check this : 
LIRE (Lucene Image REtrieval) 
library - 
https://code.google.com/p/lire/
Recommendations 
http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/
Popular Frameworks/Tools 
• Weka 
• Carrot2 
• Gate 
• OpenNLP 
• LingPipe 
• Stanford NLP 
• Mallet – Topic Modelling 
• Gensim – Topic Modelling (Python) 
• Apache Mahout 
• MLib – Apache Spark 
• scikit-learn - Python 
• LIBSVM : Support Vector Machines 
• and many more…
Advanced concepts (related to IR) 
• Topic Modelling 
• Latent Dirichlet allocation (LDA) 
• Latent semantic analysis (LSA/LSI) - Semantic 
Search 
• Singular Value Decomposition (SVD) 
• Summarization (without Training)
Solr/Lucene Meetup 
• Case study of Rujhaan.com 
(A social news app ) 
• Saturday, Sep 27, 2014 10:00 AM 
• IIIT Hyderabad 
• URL: http://www.meetup.com/Hyderabad- 
Apache-Solr-Lucene-Group/events/203434032/ 
OR 
• Search on Google … 
Topics of Talk 
 Crawler(Crawler4j) 
 MongoDB 
 Solr 
 Nginx, ApacheTomcat 
 Redis 
 Machine Learning 
1. Classification - Classification 
of News, Tweets - Lingpipe 
2. Clustering, - Similar Items - 
carrot2 (Near Future: Hadoop 
and Apache Spark ) 
3. Summarization - Extracting 
the main text with Automatic 
Summary of article 
4. Topics Extraction from text
Questions ? 
34
Thanks! 
@rahuldausa on twitter and slideshare 
http://www.linkedin.com/in/rahuldausa 
Interested in Search/Information Retrieval ? 
Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 
35

More Related Content

PPTX
Machine learning ppt
PPTX
Introduction to Machine Learning
PPTX
Introduction to ML (Machine Learning)
PPTX
Machine Learning
PDF
An introduction to Machine Learning
PPTX
Deep learning
PPTX
Machine Learning
PDF
Intellectual revolutions that defined society
Machine learning ppt
Introduction to Machine Learning
Introduction to ML (Machine Learning)
Machine Learning
An introduction to Machine Learning
Deep learning
Machine Learning
Intellectual revolutions that defined society

What's hot (20)

PDF
Machine learning
PPT
Machine Learning
PPTX
Machine Learning
PPT
2.17Mb ppt
PDF
Machine Learning and its Applications
PPTX
Machine Learning vs. Deep Learning
PDF
Lecture 1: What is Machine Learning?
PPT
Machine learning
PDF
Dimensionality Reduction
PPTX
Intro/Overview on Machine Learning Presentation
PPTX
INTRODUCTION TO MACHINE LEARNING.pptx
PPTX
Machine Learning and Real-World Applications
PPT
Machine Learning
PPTX
Random forest
PPTX
Machine Learning
PPTX
Support Vector Machine ppt presentation
PPTX
1.Introduction to deep learning
PDF
Convolutional Neural Networks (CNN)
PPTX
Machine learning
PPTX
Overfitting & Underfitting
Machine learning
Machine Learning
Machine Learning
2.17Mb ppt
Machine Learning and its Applications
Machine Learning vs. Deep Learning
Lecture 1: What is Machine Learning?
Machine learning
Dimensionality Reduction
Intro/Overview on Machine Learning Presentation
INTRODUCTION TO MACHINE LEARNING.pptx
Machine Learning and Real-World Applications
Machine Learning
Random forest
Machine Learning
Support Vector Machine ppt presentation
1.Introduction to deep learning
Convolutional Neural Networks (CNN)
Machine learning
Overfitting & Underfitting
Ad

Viewers also liked (20)

PPT
Machine Learning presentation.
PPT
Basics of Machine Learning
PDF
Machine Learning for Dummies
PPTX
Introduction to Big Data/Machine Learning
PDF
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
PPTX
Machine Learning and Artificial Intelligence
PPTX
supervised learning
PPT
Basic web dev
PPTX
Machine learning
PPTX
Presentation on supervised learning
PPT
MachineLearning.ppt
PDF
Data Science, Machine Learning and Neural Networks
PDF
Introduction to Digital Image Processing Using MATLAB
PDF
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
PPTX
Future technology
PPS
Image Processing Basics
PPT
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
PPT
8085 microprocessor architecture ppt
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
PPT
Linux command ppt
Machine Learning presentation.
Basics of Machine Learning
Machine Learning for Dummies
Introduction to Big Data/Machine Learning
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Machine Learning and Artificial Intelligence
supervised learning
Basic web dev
Machine learning
Presentation on supervised learning
MachineLearning.ppt
Data Science, Machine Learning and Neural Networks
Introduction to Digital Image Processing Using MATLAB
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
Future technology
Image Processing Basics
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 microprocessor architecture ppt
Artificial Intelligence, Machine Learning and Deep Learning
Linux command ppt
Ad

Similar to Introduction to Machine Learning (20)

PPTX
Introduction to Machine learning ppt
PDF
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PPTX
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
PPTX
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
PDF
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
PDF
Machine Learning for Everyone
PPTX
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
PPT
Facets and Pivoting for Flexible and Usable Linked Data Exploration
PDF
Handout on Object orienetd Analysis and Design
PPT
Exploring the Semantic Web
PPTX
Case study of Rujhaan.com (A social news app )
PPTX
Analyzing a system and specifying the requirements
PDF
Object Modelling Technique " ooad "
PPTX
Machine Learning Innovations
PPTX
Ooad unit – 1 introduction
PPTX
Building a real time, solr-powered recommendation engine
PPTX
Object Oriented Programming
PDF
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Introduction to Machine learning ppt
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Machine Learning for Everyone
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Handout on Object orienetd Analysis and Design
Exploring the Semantic Web
Case study of Rujhaan.com (A social news app )
Analyzing a system and specifying the requirements
Object Modelling Technique " ooad "
Machine Learning Innovations
Ooad unit – 1 introduction
Building a real time, solr-powered recommendation engine
Object Oriented Programming
Scaling Recommendations, Semantic Search, & Data Analytics with solr

More from Rahul Jain (14)

PDF
Flipkart Strategy Analysis and Recommendation
PPTX
Emerging technologies /frameworks in Big Data
PPTX
Building a Large Scale SEO/SEM Application with Apache Solr
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PPTX
Introduction to Apache Spark
PPTX
Introduction to Scala
PPTX
What is NoSQL and CAP Theorem
PPTX
Introduction to Elasticsearch with basics of Lucene
PPTX
Introduction to Apache Lucene/Solr
PPTX
Introduction to Lucene & Solr and Usecases
PPTX
Introduction to Kafka and Zookeeper
PPTX
Apache kafka
PPTX
Hadoop & HDFS for Beginners
DOC
Hibernate tutorial for beginners
Flipkart Strategy Analysis and Recommendation
Emerging technologies /frameworks in Big Data
Building a Large Scale SEO/SEM Application with Apache Solr
Real time Analytics with Apache Kafka and Apache Spark
Introduction to Apache Spark
Introduction to Scala
What is NoSQL and CAP Theorem
Introduction to Elasticsearch with basics of Lucene
Introduction to Apache Lucene/Solr
Introduction to Lucene & Solr and Usecases
Introduction to Kafka and Zookeeper
Apache kafka
Hadoop & HDFS for Beginners
Hibernate tutorial for beginners

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced IT Governance
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Network Security Unit 5.pdf for BCA BBA.
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Empathic Computing: Creating Shared Understanding
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
The Rise and Fall of 3GPP – Time for a Sabbatical?
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity
Advanced IT Governance
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx

Introduction to Machine Learning

  • 1. Introduction to Machine Learning September 2014 Meetup Rahul Jain @rahuldausa Join us @ For Solr, Lucene, Elasticsearch, Machine Learning, IR http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ http://www.meetup.com/DataAnalyticsGroup/ Join us @ For Hadoop, Spark, Cascading, Scala, NoSQL, Crawlers and all cutting edge technologies. http://www.meetup.com/Hyderabad-Programming-Geeks-Group/
  • 2. Agenda • Introduction • Basics • Classification • Clustering • Regression • Use-Cases 2
  • 3. Quick Questionnaire How many people have heard about Machine Learning How many people know about Machine Learning How many people are using Machine Learning
  • 4. About • subfield of Artificial Intelligence (AI) • name is derived from the concept that it deals with “construction and study of systems that can learn from data” • can be seen as building blocks to make computers learn to behave more intelligently • It is a theoretical concept. There are various techniques with various implementations. • http://en.wikipedia.org/wiki/Machine_learning
  • 5. In other words… “A computer program is said to learn from experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E”
  • 6. Terminology • Features – The number of features or distinct traits that can be used to describe each item in a quantitative manner. • Samples – A sample is an item to process (e.g. classify). It can be a document, a picture, a sound, a video, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits. • Feature vector – is an n-dimensional vector of numerical features that represent some object. • Feature extraction – Preparation of feature vector – transforms the data in the high-dimensional space to a space of fewer dimensions. • Training/Evolution set – Set of data to discover potentially predictive relationships.
  • 7. Let’s dig deep into it… What do you mean by Apple
  • 8. Learning (Training) Features: 1. Color: Radish/Red 2. Type : Fruit 3. Shape etc… Features: 1. Sky Blue 2. Logo 3. Shape etc… Features: 1. Yellow 2. Fruit 3. Shape etc…
  • 10. Categories • Supervised Learning • Unsupervised Learning • Semi-Supervised Learning • Reinforcement Learning
  • 11. Supervised Learning • the correct classes of the training data are known Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 12. Unsupervised Learning • the correct classes of the training data are not known Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 13. Semi-Supervised Learning • A Mix of Supervised and Unsupervised learning Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 14. Reinforcement Learning • allows the machine or software agent to learn its behavior based on feedback from the environment. • This behavior can be learnt once and for all, or keep on adapting as time goes by. Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 16. Techniques • classification: predict class from observations • clustering: group observations into “meaningful” groups • regression (prediction): predict value from observations
  • 17. Classification • classify a document into a predefined category. • documents can be text, images • Popular one is Naive Bayes Classifier. • Steps: – Step1 : Train the program (Building a Model) using a training set with a category for e.g. sports, cricket, news, – Classifier will compute probability for each word, the probability that it makes a document belong to each of considered categories – Step2 : Test with a test data set against this Model • http://en.wikipedia.org/wiki/Naive_Bayes_classifier
  • 18. Clustering • clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other • objects are not predefined • For e.g. these keywords – “man’s shoe” – “women’s shoe” – “women’s t-shirt” – “man’s t-shirt” – can be cluster into 2 categories “shoe” and “t-shirt” or “man” and “women” • Popular ones are K-means clustering and Hierarchical clustering
  • 19. K-means Clustering • partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. • http://en.wikipedia.org/wiki/K-means_clustering http://pypr.sourceforge.net/kmeans.html
  • 20. Hierarchical clustering • method of cluster analysis which seeks to build a hierarchy of clusters. • There can be two strategies – Agglomerative: • This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. • Time complexity is O(n^3) – Divisive: • This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. • Time complexity is O(2^n) • http://en.wikipedia.org/wiki/Hierarchical_clustering
  • 21. Regression • is a measure of the relation between the mean value of one variable (e.g. output) and corresponding values of other variables (e.g. time and cost). • regression analysis is a statistical process for estimating the relationships among variables. • Regression means to predict the output value using training data. • Popular one is Logistic regression (binary regression) • http://en.wikipedia.org/wiki/Logistic_regression
  • 22. Classification vs Regression • Classification means to group the output into a class. • classification to predict the type of tumor i.e. harmful or not harmful using training data • if it is discrete/categorical variable, then it is classification problem • Regression means to predict the output value using training data. • regression to predict the house price from training data • if it is a real number/continuous, then it is regression problem.
  • 23. Let’s see the usage in Real life
  • 24. Use-Cases • Spam Email Detection • Machine Translation (Language Translation) • Image Search (Similarity) • Clustering (KMeans) : Amazon Recommendations • Classification : Google News continued…
  • 25. Use-Cases (contd.) • Text Summarization - Google News • Rating a Review/Comment: Yelp • Fraud detection : Credit card Providers • Decision Making : e.g. Bank/Insurance sector • Sentiment Analysis • Speech Understanding – iPhone with Siri • Face Detection – Facebook’s Photo tagging
  • 26. Classification in Action isn’t it easy?
  • 27. it’s not (Snapshot of Spam folder) Not a Spam Not a Spam
  • 28. NER (Named Entity Recognition) http://nlp.stanford.edu:8080/ner/process
  • 29. Similar/Duplicate Images Remember Features ? (Feature Extraction) Can be : • Width • Height • Contrast • Brightness • Position • Hue • Colors Credit: https://www.google.co.in/ Check this : LIRE (Lucene Image REtrieval) library - https://code.google.com/p/lire/
  • 31. Popular Frameworks/Tools • Weka • Carrot2 • Gate • OpenNLP • LingPipe • Stanford NLP • Mallet – Topic Modelling • Gensim – Topic Modelling (Python) • Apache Mahout • MLib – Apache Spark • scikit-learn - Python • LIBSVM : Support Vector Machines • and many more…
  • 32. Advanced concepts (related to IR) • Topic Modelling • Latent Dirichlet allocation (LDA) • Latent semantic analysis (LSA/LSI) - Semantic Search • Singular Value Decomposition (SVD) • Summarization (without Training)
  • 33. Solr/Lucene Meetup • Case study of Rujhaan.com (A social news app ) • Saturday, Sep 27, 2014 10:00 AM • IIIT Hyderabad • URL: http://www.meetup.com/Hyderabad- Apache-Solr-Lucene-Group/events/203434032/ OR • Search on Google … Topics of Talk  Crawler(Crawler4j)  MongoDB  Solr  Nginx, ApacheTomcat  Redis  Machine Learning 1. Classification - Classification of News, Tweets - Lingpipe 2. Clustering, - Similar Items - carrot2 (Near Future: Hadoop and Apache Spark ) 3. Summarization - Extracting the main text with Automatic Summary of article 4. Topics Extraction from text
  • 35. Thanks! @rahuldausa on twitter and slideshare http://www.linkedin.com/in/rahuldausa Interested in Search/Information Retrieval ? Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 35