SlideShare a Scribd company logo
“PAGE RANKING”
ALGORITHM
INTRODUCTION
• Finding useful information on the World Wide Web is something many of us take for
granted. According to the Internet research firm Netcraft, there are nearly 150,000,000
active Web sites on the Internet today.
• Google's algorithm does the work for you by searching out Web pages that contain
the keywords you used to search, then assigning a rank to each page based several
factors, including how many times the keywords appear on the page. Higher ranked
pages appear further up in Google's search engine results page (SERP), meaning that
the best links relating to your search query are theoretically the first ones Google lists.
• Automated programs called spiders or crawlers travel the Web, moving from link to link
and building up an index page that includes certain keywords. Google references this
index when a user enters a search query. The search engine lists the pages that contain
the same keywords that were in the user's search terms.
• Also like other search engines, Google has a large index of keywords and where those words can be found.
What sets Google apart is how it ranks search results, which in turn determines the order Google displays results
on its search engine results page (SERP). Google uses a trademarked algorithm called PageRank, which assigns
each Web page a relevancy score.
• Keyword placement plays a part in how Google finds sites. Google looks for keywords throughout each Web
page, but some sections are more important than others. Including the keyword in the Web page's title is a
good idea, for example. Google also searches for keywords in headings.
How to decide which page is to be selected and which has to be left out,
google does this by asking questions 200 of them, few important ones are:
i. How many time the keyword is contained in the page ? i.e.
frequency of the word in the page
ii. Do words appear in title ,URL, directly adjacent, meta tag?
iii. Does page include Synonyms..
iv. Page from quality website, low quality,…
v. Page rank?
PAGERANKING ALGORITHM
• Google’s PageRank algorithm has become one of the most famous in
computer science. It was originally designed to rank websites according
to their importance by assuming that a site is important if it is linked to by
other important sites it follows the real life philosophy that
“How does a product or an individual get popular when people other
than the individual know about that individual or product “
which is similar to page ranking of a page when other webpages has a
link to the specific web page.
• The algorithm works by counting the links to a website and the
importance of the sites these come from. It then uses this to work out the
importance of the original site. Through a process of iteration, the
algorithm comes up with a ranking.
• PageRank assigns a rank or score to every search result. The higher the page's
score, the further up the search results list it will appear.
• Scores are partially determined by the number of other Web pages that link to
the target page. Each link is counted as a vote for the target. The logic behind
this is that pages with high quality content will be linked to, more often than
mediocre pages.
• Not all votes are equal. Votes from a high-ranking Web page count more than
votes from low-ranking sites. You can't really boost one Web page's rank by
making a bunch of empty Web sites linking back to the target page.
• The more links a Web page sends out, the more diluted its voting power
becomes. In other words, if a high-ranking page links to hundreds of other pages,
each individual vote won't count as much as it would if the page only linked to a
few sites.
• Other factors that might affect scoring include the how long the site has been
around, the strength of the domain name, how and where the keywords appear
on the site and the age of the links going to and from the site. Google tends to
place more value on sites that have been around for a while.
A Web page's PageRank depends on a few factors:
• The frequency and location of keywords within the Web page: If the
keyword only appears once within the body of a page, it will receive
a low score for that keyword.
• How long the Web page has existed: People create new Web pages
every day, and not all of them stick around for long. Google places
more value on pages with an established history.
• The number of other Web pages that link to the page in question:
Google looks at how many Web pages link to a particular site to
determine its relevance.
• Out of these three factors, the third
is the most important. It's easier to
understand it with an example.
• Let's look at a search for the terms
"Planet Earth.“
• As more Web pages link to
Discovery's Planet Earth page, the
Discovery page's rank increases.
When Discovery's page ranks higher
than other pages, it shows up at the
top of the Google search results
page.
PageRank description
We assume page A has pages T1...Tn which point to it .
The parameter d is a damping factor which can be set between 0 and 1. We usually set d
to 0.85.
The PageRank theory holds that an imaginary surfer who is randomly clicking on links will
eventually stop clicking.
The probability, at any step, that the person will continue is a damping factor d.
Various studies have tested different damping factors, but it is generally assumed that the
damping factor will be set around 0.85.
Also C(A) is defined as the number of links going out of page A.
The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
the PageRank's form a probability distribution over web pages,
“so the sum of all web pages' PageRank's will be one”.
How is PageRank Calculated?
• The PR of each page depends on the PR of the pages pointing to it. But
we won’t know what PR those pages have until the pages pointing
to them have their PR calculated and so on… And when you consider that
page links can form circles it seems impossible to do this calculation!
• the Google paper says:
PageRank or PR(A) can be calculated using a simple iterative algorithm,
and corresponds to the principal eigenvector of the normalized link matrix of
the web.
What that means to us is that we can just go ahead and calculate a page’s
PR without knowing the final value of the PR of the other pages. That seems
strange but, basically, each time we run the calculation we’re getting a
closer estimate of the final value. So all we need to do is remember the
each value we calculate and repeat the calculations lots of times until the
numbers stop changing much.
Lets take the simplest example network: two pages, each pointing to the
other:
Each page has one outgoing link (the outgoing count is 1, i.e. C(A) = 1 and
C(B) = 1).
1. GUESS 1 d= 0.85
PR(A)= (1 – d) + d(PR(B)/1)
PR(B)= (1 – d) + d(PR(A)/1)
PR(A)= 0.15 + 0.85 * 1
= 1
PR(B)= 0.15 + 0.85 * 1
= 1
We don’t know what their PR should be to begin with, so let’s take a guess at 1.0 and do some calculations:
i.e.
2. GUESS 2
PR(A)= 0.15 + 0.85 * 0
= 0.15
PR(B)= 0.15 + 0.85 * 0.15
= 0.2775
PR(A)= 0.15 + 0.85 * 0.2775
= 0.385875
PR(B)= 0.15 + 0.85 * 0.385875
= 0.47799375
PR(A)= 0.15 + 0.85 * 0.47799375
= 0.5562946875
PR(B)= 0.15 + 0.85 * 0.5562946875
= 0.622850484375
Ok, let’s start the guess at 0 instead and re-calculate:
And again:
And again:
and so on. The numbers just keep going up. But will the numbers stop increasing when they get to 1.0? What if a calculation
over-shoots and goes above 1.0?
3. GUESS 3
Let’s start the guess at 40 each and do a few cycles:
PR(A) = 40
• Principle: it doesn’t matter where you start your guess, once the PageRank calculations
have settled down, the “normalized probability distribution” (the average PageRank for
all pages) will be 1.0
PR(A)= 0.15 + 0.85 * 40
= 34.25
PR(B)= 0.15 + 0.85 * 0.385875
= 29.1775
PR(A)= 0.15 + 0.85 * 29.1775
= 24.950875
PR(B)= 0.15 + 0.85 * 24.950875
= 21.35824375
First calculation
And again
PR(D)= (1-d) + d * (0)
= 0.15
no backlinks means the equation looks like this:
no matter what else is going on or how many times you do it.
Observation: every page has at least a PR of 0.15 to share out.
• Our home page has 2 and a
half times as much PR as the
child pages! Excellent!
• This is what we’d expect. All
the pages have the same
number of incoming links, all
pages are of equal
importance to each other, all
pages get the same PR of 1.0
(i.e. the “average”
probability).
EXAMPLES
• Because Google looks at links to a Web page as a vote, it's not easy to cheat the system. The best way to make sure
your Web page is high up on Google's search results is to provide great content so that people will link back to your
page. The more links your page gets, the higher its PageRank score will be. If you attract the attention of sites with a
high PageRank score, your score will grow faster.
• Mega-sites, like http://news.bbc.co.uk have tens or hundreds of editors writing new content – i.e. new pages - all day
long! Each one of those pages has rich, worthwhile content of its own and a link back to its parent or the home page!
That’s why the Home page Toolbar PR of these sites is 9/10 and the rest of us just get pushed lower and lower by
comparison…
• Principle: Content Is King! There really is no substitute for lots of good content…
Steps to a enhance your PAGERANK
1.Give visitors the information they're looking for
• Provide high-quality content on your pages, especially your homepage. This is the single most
important thing to do. If your pages contain useful information,their content will attract many
visitors and entice webmasters to link to your site. Think about the words users would type to
find your pages and include those words on your site.
2. Make sure that other sites link to yours
• Links help our crawlers find your site and can give your site greater visibility in our search results.
When returning results for a search, Google uses sophisticated text-matching techniques to
display pages that are both important and relevant to each search. Google interprets a link
from page A to page B as a vote by page A for page B.
3. Make your site easily accessible
• Build your site with a logical link structure. Every page should be reachable from at least one
static text link.
BIBLIOGRAPHY
• http://www.google.com/googlebot
• www.wikipedia.org
• http://infolab.stanford.edu/~backrub/google.html
THANK YOU

More Related Content

PPTX
Page rank algortihm
PPT
Pagerank Algorithm Explained
PPTX
PageRank Algorithm In data mining
PPTX
Components of a search engine
PDF
Chapter 2 Text Operation and Term Weighting.pdf
PPTX
XQuery
PPTX
Ranking algorithms
PPTX
Probabilistic information retrieval models & systems
Page rank algortihm
Pagerank Algorithm Explained
PageRank Algorithm In data mining
Components of a search engine
Chapter 2 Text Operation and Term Weighting.pdf
XQuery
Ranking algorithms
Probabilistic information retrieval models & systems

What's hot (20)

PDF
Google PageRank
PPTX
PageRank
PDF
Link Analysis
PPT
Page Rank
PDF
Linear algebra behind Google search
PPTX
Web crawler
PDF
Decision tree
PPTX
Implementing page rank algorithm using hadoop map reduce
PPTX
HITS + Pagerank
PPT
Artificial Neural Networks - ANN
PPTX
Page rank and hyperlink
PPSX
Perceptron (neural network)
PPT
On page seo
PDF
Deep Learning for Personalized Search and Recommender Systems
PDF
Introduction to Neural Networks
PPTX
web mining
PDF
Inference in Bayesian Networks
PPTX
Feature Engineering
PPTX
Feature Selection in Machine Learning
Google PageRank
PageRank
Link Analysis
Page Rank
Linear algebra behind Google search
Web crawler
Decision tree
Implementing page rank algorithm using hadoop map reduce
HITS + Pagerank
Artificial Neural Networks - ANN
Page rank and hyperlink
Perceptron (neural network)
On page seo
Deep Learning for Personalized Search and Recommender Systems
Introduction to Neural Networks
web mining
Inference in Bayesian Networks
Feature Engineering
Feature Selection in Machine Learning
Ad

Viewers also liked (13)

PPTX
Page rank algorithm
PDF
Google Panda
PDF
Fourier Transforms
PDF
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
PDF
PageRank and Related Methods
PDF
Link Analysis (RBY)
PPT
Lec5 Pagerank
PPT
Introduction to question answering for linked data & big data
PPT
Comparative study of different ranking algorithms adopted by search engine
PPT
Seo and page rank algorithm
PDF
The Google Pagerank algorithm - How does it work?
PPTX
Search engine optimization
Page rank algorithm
Google Panda
Fourier Transforms
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
PageRank and Related Methods
Link Analysis (RBY)
Lec5 Pagerank
Introduction to question answering for linked data & big data
Comparative study of different ranking algorithms adopted by search engine
Seo and page rank algorithm
The Google Pagerank algorithm - How does it work?
Search engine optimization
Ad

Similar to page ranking algorithm (20)

PPTX
google pagerank algorithms cosc 4335 stnaford
PPTX
Dm page rank
PPT
Search engine page rank demystification
PPTX
Google Page Ranking
DOC
PageRank & Searching
PPTX
How Google Works
PPTX
Page ranking factors
PPTX
Optimizing search engines
PDF
PPT
Page rank by university of michagain.ppt
PPT
Ranking Web Pages
PDF
PageRank Algorithm
PPTX
Search engine
PDF
Page Rank
PDF
Google page rank
PPTX
Are you interested in increasing your Google PageRank?
PPTX
Are you interested in increasing your Google PageRank?
PPTX
Are you interested in increasing your Google PageRank?
PPT
Googling of GooGle
google pagerank algorithms cosc 4335 stnaford
Dm page rank
Search engine page rank demystification
Google Page Ranking
PageRank & Searching
How Google Works
Page ranking factors
Optimizing search engines
Page rank by university of michagain.ppt
Ranking Web Pages
PageRank Algorithm
Search engine
Page Rank
Google page rank
Are you interested in increasing your Google PageRank?
Are you interested in increasing your Google PageRank?
Are you interested in increasing your Google PageRank?
Googling of GooGle

Recently uploaded (20)

PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Classroom Observation Tools for Teachers
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Presentation on HIE in infants and its manifestations
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Complications of Minimal Access Surgery at WLH
PDF
Anesthesia in Laparoscopic Surgery in India
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Pharma ospi slides which help in ospi learning
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Final Presentation General Medicine 03-08-2024.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Abdominal Access Techniques with Prof. Dr. R K Mishra
Chinmaya Tiranga quiz Grand Finale.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Microbial disease of the cardiovascular and lymphatic systems
Supply Chain Operations Speaking Notes -ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Classroom Observation Tools for Teachers
01-Introduction-to-Information-Management.pdf
Presentation on HIE in infants and its manifestations
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Microbial diseases, their pathogenesis and prophylaxis
Complications of Minimal Access Surgery at WLH
Anesthesia in Laparoscopic Surgery in India

page ranking algorithm

  • 2. INTRODUCTION • Finding useful information on the World Wide Web is something many of us take for granted. According to the Internet research firm Netcraft, there are nearly 150,000,000 active Web sites on the Internet today. • Google's algorithm does the work for you by searching out Web pages that contain the keywords you used to search, then assigning a rank to each page based several factors, including how many times the keywords appear on the page. Higher ranked pages appear further up in Google's search engine results page (SERP), meaning that the best links relating to your search query are theoretically the first ones Google lists. • Automated programs called spiders or crawlers travel the Web, moving from link to link and building up an index page that includes certain keywords. Google references this index when a user enters a search query. The search engine lists the pages that contain the same keywords that were in the user's search terms.
  • 3. • Also like other search engines, Google has a large index of keywords and where those words can be found. What sets Google apart is how it ranks search results, which in turn determines the order Google displays results on its search engine results page (SERP). Google uses a trademarked algorithm called PageRank, which assigns each Web page a relevancy score. • Keyword placement plays a part in how Google finds sites. Google looks for keywords throughout each Web page, but some sections are more important than others. Including the keyword in the Web page's title is a good idea, for example. Google also searches for keywords in headings. How to decide which page is to be selected and which has to be left out, google does this by asking questions 200 of them, few important ones are: i. How many time the keyword is contained in the page ? i.e. frequency of the word in the page ii. Do words appear in title ,URL, directly adjacent, meta tag? iii. Does page include Synonyms.. iv. Page from quality website, low quality,… v. Page rank?
  • 4. PAGERANKING ALGORITHM • Google’s PageRank algorithm has become one of the most famous in computer science. It was originally designed to rank websites according to their importance by assuming that a site is important if it is linked to by other important sites it follows the real life philosophy that “How does a product or an individual get popular when people other than the individual know about that individual or product “ which is similar to page ranking of a page when other webpages has a link to the specific web page. • The algorithm works by counting the links to a website and the importance of the sites these come from. It then uses this to work out the importance of the original site. Through a process of iteration, the algorithm comes up with a ranking.
  • 5. • PageRank assigns a rank or score to every search result. The higher the page's score, the further up the search results list it will appear. • Scores are partially determined by the number of other Web pages that link to the target page. Each link is counted as a vote for the target. The logic behind this is that pages with high quality content will be linked to, more often than mediocre pages. • Not all votes are equal. Votes from a high-ranking Web page count more than votes from low-ranking sites. You can't really boost one Web page's rank by making a bunch of empty Web sites linking back to the target page. • The more links a Web page sends out, the more diluted its voting power becomes. In other words, if a high-ranking page links to hundreds of other pages, each individual vote won't count as much as it would if the page only linked to a few sites. • Other factors that might affect scoring include the how long the site has been around, the strength of the domain name, how and where the keywords appear on the site and the age of the links going to and from the site. Google tends to place more value on sites that have been around for a while.
  • 6. A Web page's PageRank depends on a few factors: • The frequency and location of keywords within the Web page: If the keyword only appears once within the body of a page, it will receive a low score for that keyword. • How long the Web page has existed: People create new Web pages every day, and not all of them stick around for long. Google places more value on pages with an established history. • The number of other Web pages that link to the page in question: Google looks at how many Web pages link to a particular site to determine its relevance.
  • 7. • Out of these three factors, the third is the most important. It's easier to understand it with an example. • Let's look at a search for the terms "Planet Earth.“ • As more Web pages link to Discovery's Planet Earth page, the Discovery page's rank increases. When Discovery's page ranks higher than other pages, it shows up at the top of the Google search results page.
  • 8. PageRank description We assume page A has pages T1...Tn which point to it . The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) the PageRank's form a probability distribution over web pages, “so the sum of all web pages' PageRank's will be one”.
  • 9. How is PageRank Calculated? • The PR of each page depends on the PR of the pages pointing to it. But we won’t know what PR those pages have until the pages pointing to them have their PR calculated and so on… And when you consider that page links can form circles it seems impossible to do this calculation! • the Google paper says: PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. What that means to us is that we can just go ahead and calculate a page’s PR without knowing the final value of the PR of the other pages. That seems strange but, basically, each time we run the calculation we’re getting a closer estimate of the final value. So all we need to do is remember the each value we calculate and repeat the calculations lots of times until the numbers stop changing much.
  • 10. Lets take the simplest example network: two pages, each pointing to the other: Each page has one outgoing link (the outgoing count is 1, i.e. C(A) = 1 and C(B) = 1). 1. GUESS 1 d= 0.85 PR(A)= (1 – d) + d(PR(B)/1) PR(B)= (1 – d) + d(PR(A)/1) PR(A)= 0.15 + 0.85 * 1 = 1 PR(B)= 0.15 + 0.85 * 1 = 1 We don’t know what their PR should be to begin with, so let’s take a guess at 1.0 and do some calculations: i.e.
  • 11. 2. GUESS 2 PR(A)= 0.15 + 0.85 * 0 = 0.15 PR(B)= 0.15 + 0.85 * 0.15 = 0.2775 PR(A)= 0.15 + 0.85 * 0.2775 = 0.385875 PR(B)= 0.15 + 0.85 * 0.385875 = 0.47799375 PR(A)= 0.15 + 0.85 * 0.47799375 = 0.5562946875 PR(B)= 0.15 + 0.85 * 0.5562946875 = 0.622850484375 Ok, let’s start the guess at 0 instead and re-calculate: And again: And again: and so on. The numbers just keep going up. But will the numbers stop increasing when they get to 1.0? What if a calculation over-shoots and goes above 1.0?
  • 12. 3. GUESS 3 Let’s start the guess at 40 each and do a few cycles: PR(A) = 40 • Principle: it doesn’t matter where you start your guess, once the PageRank calculations have settled down, the “normalized probability distribution” (the average PageRank for all pages) will be 1.0 PR(A)= 0.15 + 0.85 * 40 = 34.25 PR(B)= 0.15 + 0.85 * 0.385875 = 29.1775 PR(A)= 0.15 + 0.85 * 29.1775 = 24.950875 PR(B)= 0.15 + 0.85 * 24.950875 = 21.35824375 First calculation And again
  • 13. PR(D)= (1-d) + d * (0) = 0.15 no backlinks means the equation looks like this: no matter what else is going on or how many times you do it. Observation: every page has at least a PR of 0.15 to share out.
  • 14. • Our home page has 2 and a half times as much PR as the child pages! Excellent! • This is what we’d expect. All the pages have the same number of incoming links, all pages are of equal importance to each other, all pages get the same PR of 1.0 (i.e. the “average” probability).
  • 15. EXAMPLES • Because Google looks at links to a Web page as a vote, it's not easy to cheat the system. The best way to make sure your Web page is high up on Google's search results is to provide great content so that people will link back to your page. The more links your page gets, the higher its PageRank score will be. If you attract the attention of sites with a high PageRank score, your score will grow faster. • Mega-sites, like http://news.bbc.co.uk have tens or hundreds of editors writing new content – i.e. new pages - all day long! Each one of those pages has rich, worthwhile content of its own and a link back to its parent or the home page! That’s why the Home page Toolbar PR of these sites is 9/10 and the rest of us just get pushed lower and lower by comparison… • Principle: Content Is King! There really is no substitute for lots of good content…
  • 16. Steps to a enhance your PAGERANK 1.Give visitors the information they're looking for • Provide high-quality content on your pages, especially your homepage. This is the single most important thing to do. If your pages contain useful information,their content will attract many visitors and entice webmasters to link to your site. Think about the words users would type to find your pages and include those words on your site. 2. Make sure that other sites link to yours • Links help our crawlers find your site and can give your site greater visibility in our search results. When returning results for a search, Google uses sophisticated text-matching techniques to display pages that are both important and relevant to each search. Google interprets a link from page A to page B as a vote by page A for page B. 3. Make your site easily accessible • Build your site with a logical link structure. Every page should be reachable from at least one static text link.
  • 17. BIBLIOGRAPHY • http://www.google.com/googlebot • www.wikipedia.org • http://infolab.stanford.edu/~backrub/google.html