SlideShare a Scribd company logo
Machine Learning & Support Vector Machines
                 Lecture 9
             Sean A. Golliher
Let a, b be two events.

  p(a | b)p(b) = p(a Ç b) = p(b | a)p(a)
             p(b | a)p(a)
  p(a | b) =
                p(b)
  p(a | b)p(b) = p(b | a)p(a)
Let D be a document in the collection.
Let R represent relevance of a document w.r.t. given (fixed)
query and let NR represent non-relevance.

Need to find p(R|D) - probability that a retrieved document D
is relevant.
            p(D | R)p(R)
 p(R | D) =
               p(D)            p(R),p(NR) - prior probability
             p(xD | NR)p(NR) of retrieving a (non) relevant
 p(NR | D) =
                   p(xD)        document
P(D|R), p(D|NR) - probability that if a relevant (non-relev
document is retrieved, it is D.
   Suppose we have a vector representing the presence and
    absence of terms (1,0,0,1,1). Terms 1, 4, & 5 are present.
   What is the probability of this document occurring in the
    relevant set?
   pi is the probability that the term i occurs in a relevant
    set. (1- pi ) would be the probability a term would not be
    included the relevant set.
   This gives us: p1 x (1-p2) x (1-p3) x p4 x p5
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
   Popular and effective ranking algorithm
    based on binary independence model
     adds document and query term weights



     k1, k2 and K are parameters whose values are set
        empirically
                                   dl is doc length
     Typical TREC value for k1 is 1.2, k2 varies from 0
      to 1000, b = 0.75
   Query with two terms, “president lincoln”, (qf = 1).
    Frequency of term i in the query
   No relevance information (r and R are zero)
   N = 500,000 documents
   “president” occurs in 40,000 documents (n1 = 40, 000)
   “lincoln” occurs in 300 documents (n2 = 300)
   “president” occurs 15 times in doc (f1 = 15)
   “lincoln” occurs 25 times (f2 = 25)
   document length is 90% of the average length (dl/avdl
    = .9)
   k1 = 1.2, b = 0.75, and k2 = 100
   K = 1.2 · (0.25 + 0.75 · 0.9) = 1.11
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
   Unigram language model (simplest form)
     probability distribution over the words in a
      language
     generation of text consists of pulling words out of
      a “bucket” according to the probability distribution
      and replacing them
   N-gram language model
     some applications use bigram and trigram
      language models where probabilities depend on
      previous words
     Based on previous n-1 words
   A topic in a document or query can be
    represented as a language model
     i.e., words that tend to occur often when
     discussing a topic will have high probabilities in
     the corresponding language model
   Rank documents by the probability that the query
    could be generated by the document language
    model (i.e. same topic) P(Q|D)
   Assuming uniform, unigram model
   Obvious estimate for unigram probabilities is


     fqi, D is number of times word occurs in document.
      D is number of words in document
     If query words are missing from document, score
      will be zero
     Missing 1 out of 4 query words same as missing 3
      out of 4. Not good for long queries!
   Document texts are a sample from the
    language model
     Missing words should not have zero probability of
     occurring (calculating probability query could be
     generated from document)
   Smoothing is a technique for estimating
    probabilities for missing (or unseen) words
     lower (or discount) the probability estimates for
      words that are seen in the document text
     assign that “left-over” probability to the estimates
      for the words that are not seen in the text
   Informational
     Finding information about some topic which may be on one or
       more web pages
     Topical search
   Navigational
     finding a particular web page that the user has either seen before
       or is assumed to exist
   Transactional
     finding a site where a task such as shopping or downloading
       music can be performed

    Broder (2002) http://www.sigir.org/forum/F2002/broder.pdf
 For effective navigational and transactional
  search, need to combine features that reflect
  user relevance
 Commercial web search engines combine
  evidence from hundreds of features to
  generate a ranking score for a web page
     page content, page metadata, anchor text, links
      (e.g., PageRank), and user behavior (click logs)
     page metadata – e.g., “age”, how often it is
      updated, the URL of the page, the domain name
      of its site, and the amount of text content
   SEO: understanding the relative importance
    of features used in search and how they can
    be optimized to obtain better search rankings
    for a web page
     e.g., improve the text used in the title tag, improve
      the text in heading tags, make sure that the
      domain name and URL contain important
      keywords, and try to improve the anchor text and
      link structure
     Some of these techniques are regarded as not
      appropriate by search engine companies
   Toolkit, written in Java, for experimenting with text.

   http://www.galagosearch.org/quick-start.html
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
   Considerable interaction between these
    fields
     Arthur Samuel: 1959 – Checkers game. World’s
     first self-learning program. IBM701.
   Web query logs have generated new wave of
    research
     e.g., “Learning to Rank”
   Supervised Learning
     Regression analysis
   Classification Problems
     Support Vector Machines (SVM)
   Unsupervised Learning
     http://www.youtube.com/watch?v=GWWIn29ZV4Q
 Reinforcement Learning
 Learning Theory
     How much training data do we need?
     How accurately can we predict an event to 99%
     accuracy?
 Papers: Boser et al,. 1992
 Standard SVM [Cortes and Vapnik, 1995]
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)

More Related Content

PPTX
Dual Embedding Space Model (DESM)
PDF
Language Models for Information Retrieval
PDF
Text Mining Analytics 101
PDF
Topic Modeling - NLP
PPTX
Duet @ TREC 2019 Deep Learning Track
PPTX
5 Lessons Learned from Designing Neural Models for Information Retrieval
PPTX
Adversarial and reinforcement learning-based approaches to information retrieval
PPTX
Neural Models for Information Retrieval
Dual Embedding Space Model (DESM)
Language Models for Information Retrieval
Text Mining Analytics 101
Topic Modeling - NLP
Duet @ TREC 2019 Deep Learning Track
5 Lessons Learned from Designing Neural Models for Information Retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
Neural Models for Information Retrieval

What's hot (20)

PDF
RDataMining slides-text-mining-with-r
PPTX
Exploring Session Context using Distributed Representations of Queries and Re...
PPTX
The Duet model
PDF
OUTDATED Text Mining 4/5: Text Classification
PPTX
Neural Models for Document Ranking
PDF
Topic modelling
PDF
Text Mining Using R
PPTX
Text Mining Infrastructure in R
PDF
Survey of Generative Clustering Models 2008
PPTX
Topic modeling using big data analytics
PDF
Text Mining with R
PPTX
PDF
Applications of Word Vectors in Text Retrieval and Classification
PPTX
The vector space model
PPTX
Vectorland: Brief Notes from Using Text Embeddings for Search
PPTX
Neural Models for Information Retrieval
PPTX
Information Retrieval
PDF
Some Information Retrieval Models and Our Experiments for TREC KBA
PPTX
Natural Language Processing in R (rNLP)
PPT
Boolean Retrieval
RDataMining slides-text-mining-with-r
Exploring Session Context using Distributed Representations of Queries and Re...
The Duet model
OUTDATED Text Mining 4/5: Text Classification
Neural Models for Document Ranking
Topic modelling
Text Mining Using R
Text Mining Infrastructure in R
Survey of Generative Clustering Models 2008
Topic modeling using big data analytics
Text Mining with R
Applications of Word Vectors in Text Retrieval and Classification
The vector space model
Vectorland: Brief Notes from Using Text Embeddings for Search
Neural Models for Information Retrieval
Information Retrieval
Some Information Retrieval Models and Our Experiments for TREC KBA
Natural Language Processing in R (rNLP)
Boolean Retrieval
Ad

Viewers also liked (20)

PDF
Knowledge extraction from support vector machines
PDF
Cost savings from auto-scaling of network resources using machine learning
PDF
Applications of Machine Learning to Location-based Social Networks
PDF
IoT Mobility Forensics
PPTX
Network_Intrusion_Detection_System_Team1
PPTX
Airline passenger profiling based on fuzzy deep machine learning
PDF
Machine Learning for dummies
PDF
Computer security using machine learning
PDF
Online Machine Learning: introduction and examples
PDF
Support Vector Machines
PDF
Classification Based Machine Learning Algorithms
PDF
BSidesLV 2013 - Using Machine Learning to Support Information Security
PDF
Machine learning support vector machines
PDF
Distributed Online Machine Learning Framework for Big Data
PPTX
Online algorithms in Machine Learning
PDF
A use case of online machine learning using Jubatus
PDF
Computer security - A machine learning approach
PPTX
Application of machine learning in industrial applications
PPTX
A review of machine learning based anomaly detection
PDF
Support Vector Machines
Knowledge extraction from support vector machines
Cost savings from auto-scaling of network resources using machine learning
Applications of Machine Learning to Location-based Social Networks
IoT Mobility Forensics
Network_Intrusion_Detection_System_Team1
Airline passenger profiling based on fuzzy deep machine learning
Machine Learning for dummies
Computer security using machine learning
Online Machine Learning: introduction and examples
Support Vector Machines
Classification Based Machine Learning Algorithms
BSidesLV 2013 - Using Machine Learning to Support Information Security
Machine learning support vector machines
Distributed Online Machine Learning Framework for Big Data
Online algorithms in Machine Learning
A use case of online machine learning using Jubatus
Computer security - A machine learning approach
Application of machine learning in industrial applications
A review of machine learning based anomaly detection
Support Vector Machines
Ad

Similar to Lecture 9 - Machine Learning and Support Vector Machines (SVM) (20)

PPTX
Tdm information retrieval
PPTX
Search Engines
PDF
Information Retrieval and Map-Reduce Implementations
PPTX
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
PPT
Artificial Intelligence
PDF
Evaluation Initiatives for Entity-oriented Search
PPT
Slides
PDF
Entity Retrieval (WWW 2013 tutorial)
PPTX
Reflected Intelligence: Lucene/Solr as a self-learning data system
PDF
PPTX
Using topic modelling frameworks for NLP and semantic search
PDF
A Document Exploring System on LDA Topic Model for Wikipedia Articles
PPTX
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
PDF
ICDIM 06 Web IR Tutorial [Compatibility Mode].pdf
PPTX
Frontiers of Computational Journalism week 2 - Text Analysis
PPT
Information Retrieval and Storage Systems
PPT
lecture11-prohdhhdhdhdhdhdhdhdbir(2).ppt
PPT
lectueereerrrrrrtttttrre11-probir(1).ppt
PPTX
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Tdm information retrieval
Search Engines
Information Retrieval and Map-Reduce Implementations
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Artificial Intelligence
Evaluation Initiatives for Entity-oriented Search
Slides
Entity Retrieval (WWW 2013 tutorial)
Reflected Intelligence: Lucene/Solr as a self-learning data system
Using topic modelling frameworks for NLP and semantic search
A Document Exploring System on LDA Topic Model for Wikipedia Articles
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
ICDIM 06 Web IR Tutorial [Compatibility Mode].pdf
Frontiers of Computational Journalism week 2 - Text Analysis
Information Retrieval and Storage Systems
lecture11-prohdhhdhdhdhdhdhdhdbir(2).ppt
lectueereerrrrrrtttttrre11-probir(1).ppt
Neural Text Embeddings for Information Retrieval (WSDM 2017)

More from Sean Golliher (8)

PDF
Time Series Forecasting using Neural Nets (GNNNs)
PDF
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
PDF
Goprez sg
PDF
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...
PPTX
Lecture 7- Text Statistics and Document Parsing
PPTX
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
PPTX
PageRank and The Google Matrix
PPTX
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler
Time Series Forecasting using Neural Nets (GNNNs)
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
Goprez sg
Property Matching and Query Expansion on Linked Data Using Kullback-Leibler D...
Lecture 7- Text Statistics and Document Parsing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
PageRank and The Google Matrix
CSCI 494 - Lect. 3. Anatomy of Search Engines/Building a Crawler

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
sap open course for s4hana steps from ECC to s4
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Understanding_Digital_Forensics_Presentation.pptx
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Review of recent advances in non-invasive hemoglobin estimation
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Lecture 9 - Machine Learning and Support Vector Machines (SVM)

  • 1. Machine Learning & Support Vector Machines Lecture 9 Sean A. Golliher
  • 2. Let a, b be two events. p(a | b)p(b) = p(a Ç b) = p(b | a)p(a) p(b | a)p(a) p(a | b) = p(b) p(a | b)p(b) = p(b | a)p(a)
  • 3. Let D be a document in the collection. Let R represent relevance of a document w.r.t. given (fixed) query and let NR represent non-relevance. Need to find p(R|D) - probability that a retrieved document D is relevant. p(D | R)p(R) p(R | D) = p(D) p(R),p(NR) - prior probability p(xD | NR)p(NR) of retrieving a (non) relevant p(NR | D) = p(xD) document P(D|R), p(D|NR) - probability that if a relevant (non-relev document is retrieved, it is D.
  • 4. Suppose we have a vector representing the presence and absence of terms (1,0,0,1,1). Terms 1, 4, & 5 are present.  What is the probability of this document occurring in the relevant set?  pi is the probability that the term i occurs in a relevant set. (1- pi ) would be the probability a term would not be included the relevant set.  This gives us: p1 x (1-p2) x (1-p3) x p4 x p5
  • 6. Popular and effective ranking algorithm based on binary independence model  adds document and query term weights  k1, k2 and K are parameters whose values are set empirically  dl is doc length  Typical TREC value for k1 is 1.2, k2 varies from 0 to 1000, b = 0.75
  • 7. Query with two terms, “president lincoln”, (qf = 1). Frequency of term i in the query  No relevance information (r and R are zero)  N = 500,000 documents  “president” occurs in 40,000 documents (n1 = 40, 000)  “lincoln” occurs in 300 documents (n2 = 300)  “president” occurs 15 times in doc (f1 = 15)  “lincoln” occurs 25 times (f2 = 25)  document length is 90% of the average length (dl/avdl = .9)  k1 = 1.2, b = 0.75, and k2 = 100  K = 1.2 · (0.25 + 0.75 · 0.9) = 1.11
  • 9. Unigram language model (simplest form)  probability distribution over the words in a language  generation of text consists of pulling words out of a “bucket” according to the probability distribution and replacing them  N-gram language model  some applications use bigram and trigram language models where probabilities depend on previous words  Based on previous n-1 words
  • 10. A topic in a document or query can be represented as a language model  i.e., words that tend to occur often when discussing a topic will have high probabilities in the corresponding language model
  • 11. Rank documents by the probability that the query could be generated by the document language model (i.e. same topic) P(Q|D)  Assuming uniform, unigram model
  • 12. Obvious estimate for unigram probabilities is  fqi, D is number of times word occurs in document. D is number of words in document  If query words are missing from document, score will be zero  Missing 1 out of 4 query words same as missing 3 out of 4. Not good for long queries!
  • 13. Document texts are a sample from the language model  Missing words should not have zero probability of occurring (calculating probability query could be generated from document)  Smoothing is a technique for estimating probabilities for missing (or unseen) words  lower (or discount) the probability estimates for words that are seen in the document text  assign that “left-over” probability to the estimates for the words that are not seen in the text
  • 14. Informational  Finding information about some topic which may be on one or more web pages  Topical search  Navigational  finding a particular web page that the user has either seen before or is assumed to exist  Transactional  finding a site where a task such as shopping or downloading music can be performed Broder (2002) http://www.sigir.org/forum/F2002/broder.pdf
  • 15.  For effective navigational and transactional search, need to combine features that reflect user relevance  Commercial web search engines combine evidence from hundreds of features to generate a ranking score for a web page  page content, page metadata, anchor text, links (e.g., PageRank), and user behavior (click logs)  page metadata – e.g., “age”, how often it is updated, the URL of the page, the domain name of its site, and the amount of text content
  • 16. SEO: understanding the relative importance of features used in search and how they can be optimized to obtain better search rankings for a web page  e.g., improve the text used in the title tag, improve the text in heading tags, make sure that the domain name and URL contain important keywords, and try to improve the anchor text and link structure  Some of these techniques are regarded as not appropriate by search engine companies
  • 17. Toolkit, written in Java, for experimenting with text.  http://www.galagosearch.org/quick-start.html
  • 19. Considerable interaction between these fields  Arthur Samuel: 1959 – Checkers game. World’s first self-learning program. IBM701.  Web query logs have generated new wave of research  e.g., “Learning to Rank”
  • 20. Supervised Learning  Regression analysis  Classification Problems  Support Vector Machines (SVM)  Unsupervised Learning  http://www.youtube.com/watch?v=GWWIn29ZV4Q  Reinforcement Learning  Learning Theory  How much training data do we need?  How accurately can we predict an event to 99% accuracy?
  • 21.  Papers: Boser et al,. 1992  Standard SVM [Cortes and Vapnik, 1995]

Editor's Notes

  • #6: Di = 1 product over the terms that have value 1. Example in the index if the phrase appeared in the document it would have a one. Si = denominator P(D|NR).
  • #7: http://www.miislita.com/information-retrieval-tutorial/okapi-bm25-tutorial.pdf …Stands for Best Match. Developed in 1980s.K normalizes by document length. b regulates the impact of the length normalization. B = 0.75 was found to be effective.
  • #8: Summation over all terms in the query. Scoring a single document in the collection to see how it matches a query.
  • #10: Language models used in speech recognition, machine learning et.
  • #12: Di = 1 product over the terms that have value 1. Example in the index if the phrase appeared in the document it would have a one. Qi is query word and there are n words in the query
  • #13: For example… if we have a language model and we representing a document about computer computer games the document should have a non-zero probablity for the word RPG (role playing game) even if the word does not appear in the document. Question is how much weight do you give document if it has ALL words? Is it really MORE relevant because the word appeared in the documents.
  • #15: Taxonomy – Identifying and classifying things into groups or classes.
  • #18: Di = 1 product over the terms that have value 1. Example in the index if the phrase appeared in the document it would have a one. Qi is query word and there are n words in the query
  • #23: I this case we can use density and frequency…
  • #24: Trying to maximize the width of the tube. If it is on the right it is relevant if it is on the left it is not. Then we define a decision function. How do we find the optimium? If we use the dotted line as our model we just check if data is on right or left hand side. Find a seperating hyperplane. We are going to train this function until we get a good predictive model. Finding general hyperplan wT + b = 0. Once we find w and b we can make predictions. If we put in a sample xi it should be > 0 if wT + b > 0. Will comibing the 2 inequalities next.
  • #25: Distance between to parallel lines is given by.
  • #27: The subtraction of epsilon guarantees a seperation in the data. C is a term for training errors.