SlideShare a Scribd company logo
1
The Maths behind Web search engines
(PageRank)
2010
Dante Vsevolod Zubov
2
Web search in a nutshell

Storing pages

Use depth first search to discover new pages
(crawling)

Ranking results

Human generated ranking
− Yahoo! Directory

Automated ranking
− Text
− Meta data (title, keywords etc.)
3
Motivation for the PageRank

prioritization of results is crucial

PageRank uses the link structure of the Web to
rank pages by their “importance”

named after Larry Page (co-founder of Google)
4
Simple PageRank
Pi
is a page
r: page → [0,1] is the PageRank function
BPi
is the set of pages pointing to Pi (backlinks)
|Pj
| is the number of outlinks pointing from Pj
Calculate by iterating:
5
Simple PageRank

Will r converge in all cases?

What about pages with no outlinks?
6
Random Surfer model
• on any page, random surfer will follow one of the
outlinks at random with some probability d
(damping factor, usually taken as 0.85);
• random surfer will get bored and select some
page from entire Web at random with probability
(1-d) ;
• if the page does not have outlinks then random
surfer is “teleported” to some random page on
the Web.
7
Adjusted PageRank
 BPi
now includes all of the sink pages

r a discrete probability distribution
8
PageRank example
9
Matrix representation

The link structure of the Web
can be represented by an
adjacency matrix

Define
10
Matrix representation
 The matrix M on the right is column normalized
version of the adjacency matrix we saw earlier.
 l (Pi
, Pj
)= 0 if Pj does not link to Pi
 l (Pi
, Pj
)= 1/|Pj
| if Pj links to Pi
11
existence and uniqueness of R

Let E be the N by N matrix with all its elements
equal to 1 then ER = 1.

Call the matrix in the middle M' then R is a
1-eigenvector of M'

M' is a stochastic matrix

By Perron–Frobenius theorem R does in fact
exist and is unique
12
computation of R

Algebraic method

Iterative method

repeat iteration until
13
Questions?
References:

Brin, S. and Page, L. (1998) The Anatomy of a
Large-Scale Hypertextual Web Search Engine.

Google's PageRank and Beyond, Langville &
Meyer (2006), Chapter 4

http://en.wikipedia.org/wiki/PageRank (11/2010)

More Related Content

PPT
Search engine page rank demystification
PPTX
Actively Learning to Rank Semantic Associations for Personalized Contextual E...
PPT
Vivo Search
PPT
Improving VIVO search through semantic ranking.
PPTX
WEB Data Mining
PPTX
Red Blue Presentation
PDF
2013 07 05 (uc3m) lasi emadrid grobles jgbarahona urjc lecciones aprendidas a...
PPTX
Oles Petriv “Creating one concept embedding space for persons, brands and new...
Search engine page rank demystification
Actively Learning to Rank Semantic Associations for Personalized Contextual E...
Vivo Search
Improving VIVO search through semantic ranking.
WEB Data Mining
Red Blue Presentation
2013 07 05 (uc3m) lasi emadrid grobles jgbarahona urjc lecciones aprendidas a...
Oles Petriv “Creating one concept embedding space for persons, brands and new...

Viewers also liked (10)

PPTX
3.4 Application of Set Theory
PPTX
Union and Intersection of Sets
PPTX
Set Theory and its Applications
PPT
Sets and Subsets
PPTX
Problems involving sets
PPTX
Class 5 - Set Theory and Venn Diagrams
PPTX
Three Circle Venn Diagrams
PPTX
Venn Diagrams and Sets
PPTX
Ppt sets and set operations
PPTX
Maths sets ppt
3.4 Application of Set Theory
Union and Intersection of Sets
Set Theory and its Applications
Sets and Subsets
Problems involving sets
Class 5 - Set Theory and Venn Diagrams
Three Circle Venn Diagrams
Venn Diagrams and Sets
Ppt sets and set operations
Maths sets ppt
Ad

Similar to The Maths behind Web search engines (20)

PPT
Ranking Web Pages
PDF
J046045558
PPTX
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
PPT
Page rank by university of michagain.ppt
PPT
page rank explication et exemple formule
PPTX
Page rank and hyperlink
PDF
Pagerank
 
PPT
Data.Mining.C.8(Ii).Web Mining 570802461
PDF
Google page rank
PPT
Page Rank
PPT
Page Rank
PPT
Page Rank
PPT
page rank
PPT
Page Rank
PPT
Page Rank
PPT
Page Rank
Ranking Web Pages
J046045558
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
Page rank by university of michagain.ppt
page rank explication et exemple formule
Page rank and hyperlink
Pagerank
 
Data.Mining.C.8(Ii).Web Mining 570802461
Google page rank
Page Rank
Page Rank
Page Rank
page rank
Page Rank
Page Rank
Page Rank
Ad

The Maths behind Web search engines

  • 1. 1 The Maths behind Web search engines (PageRank) 2010 Dante Vsevolod Zubov
  • 2. 2 Web search in a nutshell  Storing pages  Use depth first search to discover new pages (crawling)  Ranking results  Human generated ranking − Yahoo! Directory  Automated ranking − Text − Meta data (title, keywords etc.)
  • 3. 3 Motivation for the PageRank  prioritization of results is crucial  PageRank uses the link structure of the Web to rank pages by their “importance”  named after Larry Page (co-founder of Google)
  • 4. 4 Simple PageRank Pi is a page r: page → [0,1] is the PageRank function BPi is the set of pages pointing to Pi (backlinks) |Pj | is the number of outlinks pointing from Pj Calculate by iterating:
  • 5. 5 Simple PageRank  Will r converge in all cases?  What about pages with no outlinks?
  • 6. 6 Random Surfer model • on any page, random surfer will follow one of the outlinks at random with some probability d (damping factor, usually taken as 0.85); • random surfer will get bored and select some page from entire Web at random with probability (1-d) ; • if the page does not have outlinks then random surfer is “teleported” to some random page on the Web.
  • 7. 7 Adjusted PageRank  BPi now includes all of the sink pages  r a discrete probability distribution
  • 9. 9 Matrix representation  The link structure of the Web can be represented by an adjacency matrix  Define
  • 10. 10 Matrix representation  The matrix M on the right is column normalized version of the adjacency matrix we saw earlier.  l (Pi , Pj )= 0 if Pj does not link to Pi  l (Pi , Pj )= 1/|Pj | if Pj links to Pi
  • 11. 11 existence and uniqueness of R  Let E be the N by N matrix with all its elements equal to 1 then ER = 1.  Call the matrix in the middle M' then R is a 1-eigenvector of M'  M' is a stochastic matrix  By Perron–Frobenius theorem R does in fact exist and is unique
  • 12. 12 computation of R  Algebraic method  Iterative method  repeat iteration until
  • 13. 13 Questions? References:  Brin, S. and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine.  Google's PageRank and Beyond, Langville & Meyer (2006), Chapter 4  http://en.wikipedia.org/wiki/PageRank (11/2010)