Lecture-31.pptx

INFORMATION RETRIEVAL
TECHNIQUES
BY
DR. ADNAN ABID
Lecture # 31
Dimensionality reduction
1

ACKNOWLEDGEMENTS
The presentation of this lecture has been taken from the underline
sources
1. “Introduction to information retrieval” by Prabhakar Raghavan,
Christopher D. Manning, and Hinrich Schütze
2. “Managing gigabytes” by Ian H. Witten, Alistair Moffat, Timothy C.
Bell
3. “Modern information retrieval” by Baeza-Yates Ricardo,
4. “Web Information Retrieval” by Stefano Ceri, Alessandro
Bozzon, Marco Brambilla

Outline
• Dimensionality reduction
• Random projection onto k<<m axes
• Computing the random projection
• Latent semantic indexing (LSI)
• The matrix
• Dimension reduction
3

Dimensionality reduction
• What if we could take our vectors and “pack” them into fewer
dimensions (say 10000100) while preserving distances?
• (Well, almost.)
• Speeds up cosine computations.
• Two methods:
• Random projection.
• “Latent semantic indexing”.

Random projection onto k<<m axes.
• Choose a random direction x1 in the vector space.
• For i = 2 to k,
• Choose a random direction xi that is orthogonal to x1, x2, … xi-1
.
• Project each doc vector into the subspace x1, x2, … xk.

E.g., from 3 to 2 dimensions
D2
D1
x1
t 3
x2
t 2
t 1
x1
x2
D2
D1
x1 is a random direction in (t1,t2,t3) space.
x2 is chosen randomly but orthogonal to x1.

Guarantee
• With high probability, relative distances are (approximately)
preserved by projection.
• Pointer to precise theorem in Resources.

Computing the random projection
• Projecting n vectors from m dimensions down to k
dimensions:
• Start with m  n matrix of terms  docs, A.
• Find random k  m orthogonal projection matrix R.
• Compute matrix product W = R  A.
• jth column of W is the vector corresponding to doc j, but
now in k << m dimensions.

Cost of computation
• This takes a total of kmn multiplications.
• Expensive - see Resources for ways to do essentially the same thing,
quicker.
• Exercise: by projecting from 10000 dimensions down to 100, are we
really going to make each cosine computation faster?
• Size of the vectors would decrease
• Will result into smaller postings

Latent semantic indexing (LSI)
• Another technique for dimension reduction
• Random projection was data-independent
• LSI on the other hand is data-dependent
• Eliminate redundant axes
• Pull together “related” axes
• car and automobile

Notions from linear algebra
• Matrix, vector
• Matrix transpose and product
• Rank
• Eigenvalues and eigenvectors.

The matrix
12
• The matrix
has rank 2: the first two rows are linearly independent, so the rank is at
least 2, but all three rows are linearly dependent (the first is equal to
the sum of the second and third) so the rank must be less than 3.
• The matrix
has rank 1: there are nonzero colums. So the rank is positive. but any
pair of columns is linearly dependent.

The matrix
Similarly, the transpose
of A has rank 1. Indeed, since the column vectors of A are the row
vectors of the transpose of A. the statement that the column rank of a
matrix equals its row rank is equivalent to the statement that the rank
of a matrix is equal to the rank of its transpose, i.e., rk(A) = rk(A).
13

Singular-Value Decomposition
• Recall m  n matrix of terms  docs, A.
• A has rank r  m,n.
• Define term-term correlation matrix T=AAt
• At denotes the matrix transpose of A.
• T is a square, symmetric m  m matrix.
• Doc-doc correlation matrix D=AtA.
• D is a square, symmetric n  n matrix.
Why?

Eigenvectors
• Denote by P the m  r matrix of eigenvectors of T.
• Denote by R the n  r matrix of eigenvectors of D.
• It turns out A can be expressed (decomposed) as A = PQRt
• Q is a diagonal matrix with the eigenvalues of AAt in sorted order.

Eigen Vectors
• The transformation matrix preserves the direction of
• vectors parallel to (in blue) and [ _1
1] (in violet). The
• points that lie on the line through the origin, parallel to an eigenvector, remain
on the line after the transformation. The vectors in red are not eigenvectors,
therefore their direction is altered by the transformation. See also: An extended
version. showing all four quadrants.

Eigen Vectors and Eigen Values
• Av = λv

Visualization
=
A P Q Rt
mn mr rr rn

Dimension reduction
• For some s << r, zero out all but the s biggest eigenvalues in Q.
• Denote by Qs this new version of Q.
• Typically s in the hundreds while r could be in the (tens of) thousands.
• Let As = P Qs Rt
• Turns out As is a pretty good approximation to A.

Visualization
=
As P Qs Rt
0
The columns of As represent the docs, but in s<<m dimensions.
0
0

Guarantee
• Relative distances are (approximately) preserved by projection:
• Of all m  n rank s matrices, As is the best approximation to A.
• Pointer to precise theorem in Resources.

Doc-doc similarities
•As As
t is a matrix of doc-doc similarities:
•the (j,k) entry is a measure of the similarity of doc j
to doc k.

Semi-precise intuition
• We accomplish more than dimension reduction here:
• Docs with lots of overlapping terms stay together
• Terms from these docs also get pulled together.
• Thus car and automobile get pulled together because both co-occur
in docs with tires, radiator, cylinder, etc.

Query processing
• View a query as a (short) doc:
• call it row 0 of As.
• Now the entries in row 0 of As As
t give the similarities of the query
with each doc.
• Entry (0,j) is the score of doc j on the query.
• Exercise: fill in the details of scoring/ranking.
• LSI is expensive in terms of computation…
• Randomly choosing a subset of documents for dimensional reduction
can give a significant boost in performance.

Resources
• Random projection theorem:
http://citeseer.nj.nec.com/dasgupta99elementary.html
• Faster random projection:
http://citeseer.nj.nec.com/frieze98fast.html
• Latent semantic indexing:
http://citeseer.nj.nec.com/deerwester90indexing.html
• Books: MG 4.6, MIR 2.7.2.

Lecture-31.pptx

More Related Content

Similar to Lecture-31.pptx (20)

More from AliZaib71 (6)

Recently uploaded (20)

Lecture-31.pptx

Editor's Notes