SlideShare a Scribd company logo
INFORMATION RETRIEVAL
TECHNIQUES
BY
DR. ADNAN ABID
Lecture # 31
Dimensionality reduction
1
ACKNOWLEDGEMENTS
The presentation of this lecture has been taken from the underline
sources
1. “Introduction to information retrieval” by Prabhakar Raghavan,
Christopher D. Manning, and Hinrich Schütze
2. “Managing gigabytes” by Ian H. Witten, Alistair Moffat, Timothy C.
Bell
3. “Modern information retrieval” by Baeza-Yates Ricardo,
4. “Web Information Retrieval” by Stefano Ceri, Alessandro
Bozzon, Marco Brambilla
Outline
• Dimensionality reduction
• Random projection onto k<<m axes
• Computing the random projection
• Latent semantic indexing (LSI)
• The matrix
• Dimension reduction
3
Dimensionality reduction
• What if we could take our vectors and “pack” them into fewer
dimensions (say 10000100) while preserving distances?
• (Well, almost.)
• Speeds up cosine computations.
• Two methods:
• Random projection.
• “Latent semantic indexing”.
Random projection onto k<<m axes.
• Choose a random direction x1 in the vector space.
• For i = 2 to k,
• Choose a random direction xi that is orthogonal to x1, x2, … xi-1
.
• Project each doc vector into the subspace x1, x2, … xk.
E.g., from 3 to 2 dimensions
D2
D1
x1
t 3
x2
t 2
t 1
x1
x2
D2
D1
x1 is a random direction in (t1,t2,t3) space.
x2 is chosen randomly but orthogonal to x1.
Guarantee
• With high probability, relative distances are (approximately)
preserved by projection.
• Pointer to precise theorem in Resources.
Computing the random projection
• Projecting n vectors from m dimensions down to k
dimensions:
• Start with m  n matrix of terms  docs, A.
• Find random k  m orthogonal projection matrix R.
• Compute matrix product W = R  A.
• jth column of W is the vector corresponding to doc j, but
now in k << m dimensions.
Cost of computation
• This takes a total of kmn multiplications.
• Expensive - see Resources for ways to do essentially the same thing,
quicker.
• Exercise: by projecting from 10000 dimensions down to 100, are we
really going to make each cosine computation faster?
• Size of the vectors would decrease
• Will result into smaller postings
Latent semantic indexing (LSI)
• Another technique for dimension reduction
• Random projection was data-independent
• LSI on the other hand is data-dependent
• Eliminate redundant axes
• Pull together “related” axes
• car and automobile
Notions from linear algebra
• Matrix, vector
• Matrix transpose and product
• Rank
• Eigenvalues and eigenvectors.
The matrix
12
• The matrix
has rank 2: the first two rows are linearly independent, so the rank is at
least 2, but all three rows are linearly dependent (the first is equal to
the sum of the second and third) so the rank must be less than 3.
• The matrix
has rank 1: there are nonzero colums. So the rank is positive. but any
pair of columns is linearly dependent.
The matrix
Similarly, the transpose
of A has rank 1. Indeed, since the column vectors of A are the row
vectors of the transpose of A. the statement that the column rank of a
matrix equals its row rank is equivalent to the statement that the rank
of a matrix is equal to the rank of its transpose, i.e., rk(A) = rk(A).
13
Singular-Value Decomposition
• Recall m  n matrix of terms  docs, A.
• A has rank r  m,n.
• Define term-term correlation matrix T=AAt
• At denotes the matrix transpose of A.
• T is a square, symmetric m  m matrix.
• Doc-doc correlation matrix D=AtA.
• D is a square, symmetric n  n matrix.
Why?
Eigenvectors
• Denote by P the m  r matrix of eigenvectors of T.
• Denote by R the n  r matrix of eigenvectors of D.
• It turns out A can be expressed (decomposed) as A = PQRt
• Q is a diagonal matrix with the eigenvalues of AAt in sorted order.
Eigen Vectors
• The transformation matrix preserves the direction of
• vectors parallel to (in blue) and [ _1
1] (in violet). The
• points that lie on the line through the origin, parallel to an eigenvector, remain
on the line after the transformation. The vectors in red are not eigenvectors,
therefore their direction is altered by the transformation. See also: An extended
version. showing all four quadrants.
Eigen Vectors and Eigen Values
• Av = λv
Visualization
=
A P Q Rt
mn mr rr rn
Dimension reduction
• For some s << r, zero out all but the s biggest eigenvalues in Q.
• Denote by Qs this new version of Q.
• Typically s in the hundreds while r could be in the (tens of) thousands.
• Let As = P Qs Rt
• Turns out As is a pretty good approximation to A.
Visualization
=
As P Qs Rt
0
The columns of As represent the docs, but in s<<m dimensions.
0
0
Guarantee
• Relative distances are (approximately) preserved by projection:
• Of all m  n rank s matrices, As is the best approximation to A.
• Pointer to precise theorem in Resources.
Doc-doc similarities
•As As
t is a matrix of doc-doc similarities:
•the (j,k) entry is a measure of the similarity of doc j
to doc k.
Semi-precise intuition
• We accomplish more than dimension reduction here:
• Docs with lots of overlapping terms stay together
• Terms from these docs also get pulled together.
• Thus car and automobile get pulled together because both co-occur
in docs with tires, radiator, cylinder, etc.
Query processing
• View a query as a (short) doc:
• call it row 0 of As.
• Now the entries in row 0 of As As
t give the similarities of the query
with each doc.
• Entry (0,j) is the score of doc j on the query.
• Exercise: fill in the details of scoring/ranking.
• LSI is expensive in terms of computation…
• Randomly choosing a subset of documents for dimensional reduction
can give a significant boost in performance.
Resources
• Random projection theorem:
http://citeseer.nj.nec.com/dasgupta99elementary.html
• Faster random projection:
http://citeseer.nj.nec.com/frieze98fast.html
• Latent semantic indexing:
http://citeseer.nj.nec.com/deerwester90indexing.html
• Books: MG 4.6, MIR 2.7.2.

More Related Content

PPTX
Data Mining Lecture_9.pptx
PPT
SVD.ppt
PPTX
machine learning.pptx
PDF
Linear_Algebra_final.pdf
PPTX
Programming in python
PPTX
Fundamentals of Machine Learning.pptx
PPTX
Dimension Reduction Introduction & PCA.pptx
PPTX
Lecture 8 about data mining and how to use it.pptx
Data Mining Lecture_9.pptx
SVD.ppt
machine learning.pptx
Linear_Algebra_final.pdf
Programming in python
Fundamentals of Machine Learning.pptx
Dimension Reduction Introduction & PCA.pptx
Lecture 8 about data mining and how to use it.pptx

Similar to Lecture-31.pptx (20)

PPT
Linear Algebra and Matrices used for ML.ppt
PPT
Linear Algebra for beginner and advance students
PPT
LinearAlgebra.ppt
PPT
LinearAlgebra.ppt
PPTX
DimensionalityReduction.pptx
PPTX
Kulum alin-11 jan2014
PPTX
Linear Regression.pptx
PPTX
Manu maths ppt
PPTX
3.Maths_Unit 2_Matrices.pptxddddddddddddddddddddddddddddddddddddddddddddddddd...
PPTX
Brief review on matrix Algebra for mathematical economics
PPTX
Multimedia lossy compression algorithms
PDF
Applied Mathematics 3 Matrices.pdf
PPT
lseeeerrrrrrrdddrfffffrrrrrrrrttttti.ppt
PPTX
Lect4 principal component analysis-I
PDF
Chip Package Co-simulation
PDF
pca.pdf polymer nanoparticles and sensors
PDF
TINET_FRnOG_2008_public
PPT
20070823
PPTX
Blind Source Separation using Dictionary Learning
PPT
Matrix Algebra : Mathematics for Business
Linear Algebra and Matrices used for ML.ppt
Linear Algebra for beginner and advance students
LinearAlgebra.ppt
LinearAlgebra.ppt
DimensionalityReduction.pptx
Kulum alin-11 jan2014
Linear Regression.pptx
Manu maths ppt
3.Maths_Unit 2_Matrices.pptxddddddddddddddddddddddddddddddddddddddddddddddddd...
Brief review on matrix Algebra for mathematical economics
Multimedia lossy compression algorithms
Applied Mathematics 3 Matrices.pdf
lseeeerrrrrrrdddrfffffrrrrrrrrttttti.ppt
Lect4 principal component analysis-I
Chip Package Co-simulation
pca.pdf polymer nanoparticles and sensors
TINET_FRnOG_2008_public
20070823
Blind Source Separation using Dictionary Learning
Matrix Algebra : Mathematics for Business
Ad

More from AliZaib71 (6)

PPTX
Presentation1.pptx
PPT
Computer_Graphics_circle_drawing_techniq.ppt
PPT
SE UML.ppt
PPTX
CS911-Lecture-21_43709.pptx
PPT
Wireless Networks - CS718 Power Point Slides Lecture 02.ppt
PPTX
Web-01-HTTP.pptx
Presentation1.pptx
Computer_Graphics_circle_drawing_techniq.ppt
SE UML.ppt
CS911-Lecture-21_43709.pptx
Wireless Networks - CS718 Power Point Slides Lecture 02.ppt
Web-01-HTTP.pptx
Ad

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
annual-report-2024-2025 original latest.
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPT
Predictive modeling basics in data cleaning process
PPTX
Database Infoormation System (DBIS).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Reliability_Chapter_ presentation 1221.5784
Qualitative Qantitative and Mixed Methods.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
annual-report-2024-2025 original latest.
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to Knowledge Engineering Part 1
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Galatica Smart Energy Infrastructure Startup Pitch Deck
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
[EN] Industrial Machine Downtime Prediction
Predictive modeling basics in data cleaning process
Database Infoormation System (DBIS).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Optimise Shopper Experiences with a Strong Data Estate.pdf
Introduction-to-Cloud-ComputingFinal.pptx
IB Computer Science - Internal Assessment.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx

Lecture-31.pptx

  • 1. INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID Lecture # 31 Dimensionality reduction 1
  • 2. ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the underline sources 1. “Introduction to information retrieval” by Prabhakar Raghavan, Christopher D. Manning, and Hinrich Schütze 2. “Managing gigabytes” by Ian H. Witten, Alistair Moffat, Timothy C. Bell 3. “Modern information retrieval” by Baeza-Yates Ricardo, 4. “Web Information Retrieval” by Stefano Ceri, Alessandro Bozzon, Marco Brambilla
  • 3. Outline • Dimensionality reduction • Random projection onto k<<m axes • Computing the random projection • Latent semantic indexing (LSI) • The matrix • Dimension reduction 3
  • 4. Dimensionality reduction • What if we could take our vectors and “pack” them into fewer dimensions (say 10000100) while preserving distances? • (Well, almost.) • Speeds up cosine computations. • Two methods: • Random projection. • “Latent semantic indexing”.
  • 5. Random projection onto k<<m axes. • Choose a random direction x1 in the vector space. • For i = 2 to k, • Choose a random direction xi that is orthogonal to x1, x2, … xi-1 . • Project each doc vector into the subspace x1, x2, … xk.
  • 6. E.g., from 3 to 2 dimensions D2 D1 x1 t 3 x2 t 2 t 1 x1 x2 D2 D1 x1 is a random direction in (t1,t2,t3) space. x2 is chosen randomly but orthogonal to x1.
  • 7. Guarantee • With high probability, relative distances are (approximately) preserved by projection. • Pointer to precise theorem in Resources.
  • 8. Computing the random projection • Projecting n vectors from m dimensions down to k dimensions: • Start with m  n matrix of terms  docs, A. • Find random k  m orthogonal projection matrix R. • Compute matrix product W = R  A. • jth column of W is the vector corresponding to doc j, but now in k << m dimensions.
  • 9. Cost of computation • This takes a total of kmn multiplications. • Expensive - see Resources for ways to do essentially the same thing, quicker. • Exercise: by projecting from 10000 dimensions down to 100, are we really going to make each cosine computation faster? • Size of the vectors would decrease • Will result into smaller postings
  • 10. Latent semantic indexing (LSI) • Another technique for dimension reduction • Random projection was data-independent • LSI on the other hand is data-dependent • Eliminate redundant axes • Pull together “related” axes • car and automobile
  • 11. Notions from linear algebra • Matrix, vector • Matrix transpose and product • Rank • Eigenvalues and eigenvectors.
  • 12. The matrix 12 • The matrix has rank 2: the first two rows are linearly independent, so the rank is at least 2, but all three rows are linearly dependent (the first is equal to the sum of the second and third) so the rank must be less than 3. • The matrix has rank 1: there are nonzero colums. So the rank is positive. but any pair of columns is linearly dependent.
  • 13. The matrix Similarly, the transpose of A has rank 1. Indeed, since the column vectors of A are the row vectors of the transpose of A. the statement that the column rank of a matrix equals its row rank is equivalent to the statement that the rank of a matrix is equal to the rank of its transpose, i.e., rk(A) = rk(A). 13
  • 14. Singular-Value Decomposition • Recall m  n matrix of terms  docs, A. • A has rank r  m,n. • Define term-term correlation matrix T=AAt • At denotes the matrix transpose of A. • T is a square, symmetric m  m matrix. • Doc-doc correlation matrix D=AtA. • D is a square, symmetric n  n matrix. Why?
  • 15. Eigenvectors • Denote by P the m  r matrix of eigenvectors of T. • Denote by R the n  r matrix of eigenvectors of D. • It turns out A can be expressed (decomposed) as A = PQRt • Q is a diagonal matrix with the eigenvalues of AAt in sorted order.
  • 16. Eigen Vectors • The transformation matrix preserves the direction of • vectors parallel to (in blue) and [ _1 1] (in violet). The • points that lie on the line through the origin, parallel to an eigenvector, remain on the line after the transformation. The vectors in red are not eigenvectors, therefore their direction is altered by the transformation. See also: An extended version. showing all four quadrants.
  • 17. Eigen Vectors and Eigen Values • Av = λv
  • 18. Visualization = A P Q Rt mn mr rr rn
  • 19. Dimension reduction • For some s << r, zero out all but the s biggest eigenvalues in Q. • Denote by Qs this new version of Q. • Typically s in the hundreds while r could be in the (tens of) thousands. • Let As = P Qs Rt • Turns out As is a pretty good approximation to A.
  • 20. Visualization = As P Qs Rt 0 The columns of As represent the docs, but in s<<m dimensions. 0 0
  • 21. Guarantee • Relative distances are (approximately) preserved by projection: • Of all m  n rank s matrices, As is the best approximation to A. • Pointer to precise theorem in Resources.
  • 22. Doc-doc similarities •As As t is a matrix of doc-doc similarities: •the (j,k) entry is a measure of the similarity of doc j to doc k.
  • 23. Semi-precise intuition • We accomplish more than dimension reduction here: • Docs with lots of overlapping terms stay together • Terms from these docs also get pulled together. • Thus car and automobile get pulled together because both co-occur in docs with tires, radiator, cylinder, etc.
  • 24. Query processing • View a query as a (short) doc: • call it row 0 of As. • Now the entries in row 0 of As As t give the similarities of the query with each doc. • Entry (0,j) is the score of doc j on the query. • Exercise: fill in the details of scoring/ranking. • LSI is expensive in terms of computation… • Randomly choosing a subset of documents for dimensional reduction can give a significant boost in performance.
  • 25. Resources • Random projection theorem: http://citeseer.nj.nec.com/dasgupta99elementary.html • Faster random projection: http://citeseer.nj.nec.com/frieze98fast.html • Latent semantic indexing: http://citeseer.nj.nec.com/deerwester90indexing.html • Books: MG 4.6, MIR 2.7.2.

Editor's Notes

  • #5: 00:02:40  00:03:05
  • #6: 00:03:20  00:03:30 00:03:40  00:04:05 00:04:10  00:05:20 00:06:30  00:06:50
  • #7: 00:07:04  00:08:10 00:08:50  00:09:15
  • #8: 00:09:43  00:10:05 00:10:25  00:10:35
  • #9: 00:10:45  00:11:20 00:11:30  00:13:25
  • #10: 00:13:45  00:14:15 00:14:25  00:14:45
  • #11: 00:15:56  00:16:10 00:16:20  00:16:45(random) 00:18:30  00:18:45 (lsi) 00:19:04  00:19:45 (eliminate) 00:20:50  00:21:05 (pulll)
  • #12: 00:21:35  00:21:48 (layering)
  • #13: 00:22:30  00:23:00 00:23:15  00:24:33
  • #14: 00:26:15  00:26:29
  • #15: 00:27:10  00:28:00 (recall) 00:28:25  00:29:00 (define) 00:29:05  00:30:10 (doc-doc) The column rank of a matrix A is the maximum number of linearly independent column vectors of A. The row rank of A is the maximum number of linearly independent row vectors of A. Equivalently, the column rank of A is the dimension of the column space of A, while the row rank of A is the dimension of the row space of A. A result of fundamental importance in linear algebra is that the column rank and the row rank are always equal. (Two proofs of this result are given below.) This number (i.e., the number of linearly independent rows or columns) is simply called the rank of A. The matrix has rank 2: the first two rows are linearly independent, so the rank is at least 2, but all three rows are linearly dependent (the first is equal to the sum of the second and third) so the rank must be less than 3. The matrix has rank 1: there are nonzero columns, so the rank is positive, but any pair of columns is linearly dependent. Similarly, the transpose of A has rank 1. Indeed, since the column vectors of A are the row vectors of the transpose of A, the statement that the column rank of a matrix equals its row rank is equivalent to the statement that the rank of a matrix is equal to the rank of its transpose, i.e., rk(A) = rk(AT).
  • #16: 00:31:05  00:31:40 (donate by p) 00:31:48  00:32:00 (donate by r) 00:32:12  00:33:15 (it turns)
  • #17: 00:33:00  00:34:55 00:35:05  00:35:35
  • #18: 00:36:25  00:36:50 00:37:15  00:39:00
  • #19: 00:40:05  00:40:20 00:41:10  00:42:37
  • #20: 00:43:45  00:44:25
  • #21: 00:47:40  00:48:25
  • #22: 00:49:50  00:50:25
  • #23: 00:51:20  00:51:40
  • #24: 00:52:10  00:52:50 00:53:20  00:53:40
  • #25: 00:54:15  00:55:00