SlideShare a Scribd company logo
“
”
Topic : Search Engine
Section 1
Group Members:
1. Umair Daud Raja(9380)
2.Abdul Basit(9675)
What are Search Engines?
 It is basically a type of program that uses keywords to search for documents
that relate to these keywords and then puts the result found in the order of
relevance to the topic that was searched for.
 Examples:
• Google
• Bing
• Ask
 The information we want to find maybe a mix of Images, Videos, Web Pages
and other type of files.
 Moreover the pages that are displayed on the search are called as Search
Engine Result Pages (SERPs).
How search engine works?
Which is the best search engine?
 Google : 1,100,000,000 - Estimated Unique Monthly Visitors
 Bing: 350,000,000 - Estimated Unique Monthly Visitors
 Yahoo: 300,000,000 - Estimated Unique Monthly Visitors
 Ask : 245,000,000 - Estimated Unique Monthly Visitors
 AOL Search : 125,000,000 - Estimated Unique Monthly Visitors
 Reference : http://www.ebizmba.com/articles/search-engines
Market share
 Reference : http://searchengineland.com/googles-search-market-share-67-
percent-pc-83-percent-mobile-203937
Search Engines: Algorithm
 Previous: backrub; calculate on the basis of visits.
 Algorithm: Page Rank
 Impractical Solution; It was proposed by Larry Page
 PageRank is the technique used by Google to determine importance of page
on the web.
 One of the most important factors that Google uses is PageRank. PageRank is
a numeric value that represents how important a page is on the web.
 Off course PageRank is not the only factor, which decides importance of page,
but still it is one of them.
 PageRank is described by one mathematical formula that seems very difficult
at first, but actually it is not.
Formula
 We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a
damping factor, which can be set between 0 and 1. We usually set d to 0.85.C(A) is defined as
the number of links going out of page A. The PageRank of a page A is given as follows:
 PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
 PR(A) is the PageRank of page A
 PR(Ti) is the PageRank of pages Ti which link to page A
 C(Ti) is the number of outbound links on page Ti
 d is a damping factor which can be set between 0 and 1
Damping factor
 Damping factor
 The PageRank theory holds that an imaginary surfer who is randomly clicking
on links will eventually stop clicking.
 The probability, at any step, that the person will continue is a damping
factor d. Various studies have tested different damping factors, but it is
generally assumed that the damping factor will be set around 0.85.
Explanation
Consider an imaginary web of 3 web pages.
And the inbound and outbound link
structure is as shown in the figure. The
calculations can be done by following
method :
PR(A) = 0.15 + 0.85 PR(C)
= 0.15 + (0.85*1)
= 1
PR(B) = 0.15 + 0.85 (PR(A) /
2)
= 0.15 + 0.85 (1/2)
= 0.15 + (0.85 * 0.5)
= 0.15 + 0.425
= 0.575
PR(C) = 0.15 + 0.85 ((PR(A) / 2 )+ PR
(B))
= 0.15 + 0.85 (1/2 + 0.575)
= 0.15 + 0.85 (1.075)
= 0.15 + 0.913
= 1.06
Where d=0.85, as
according to formula,
1-d=1-0.85=0.15
complexity
 O( n+m)
 Because even in a complete graph, one has to touch each edge twice So the
complexity is O(n+m)
Techniques
 We will discuss two techniques here
 Dynamical systems point of view:
 Linear algebra point of view:
Another Example.
Example(Cont.)
 We "translate" the picture into a
directed graph with 4 nodes, one for
each web site.
Example(cont.)
 In the picture below, node 1 has 3 outgoing links.
 Node 2 has 2 outgoing links.
 Node 3 has 1 outgoing link.
 Node 4 has 2 outgoing links.
 In general, if there are k outgoing
nodes then it will pass on 1/k.
Example(Cont.)
 After the diagram the Values of A, i.e. Transition matric becomes.
1. Dynamic System point of view
 Suppose that initially the importance is uniformly distributed among the 4
nodes, each getting ¼. Denote by v the initial rank vector, having all entries
equal to ¼. Each incoming link increases the importance of a web page, so at
step 1, we update the rank of each page by adding to the current value the
importance of the incoming links. This is the same as multiplying the
matrix A with v . At step 1, the new importance vector is v1 = Av. We can
iterate the process, thus at step 2, the updated importance vector
is v2 = A(Av) = A2v. Numeric computations give:
Dynamic System point of view(Cont.)
2. Linear algebra point of view:
 Let us denote by x1, x2, x3, and x4 the importance of the four pages. Analyzing
the situation at each node we get the system:
Linear algebra point of view(Cont.)
 Then we eigenvalues, and formula is
det(A- I4 ) =0 and AX=0..
In this scenarios
X=
Linear algebra point of view(Cont.)
 After calculation eigenvalues we get
eigenvalues =
and then we add eigenvalues we get 31, that is multiplied by these values to find
Pagrank.
=
Current Algorithm used by Google
 The latest algorithm used by google is “HummingBird”.
 Google started using Hummingbird about 30 August 2013, and announced the
change on September 26 on the eve of the company's 15th anniversary.
Pros and cons of PageRank
Pros:
 It is query independent
 Most relevant search results.
Cons:
 The major disadvantage of PageRank is that it favors the older pages, because
a new page, even a very good one will not have as many links as the old one.
 Search results are based on literals(keywords,) things but not on meaning.
Analysis Of Algorithm

More Related Content

ODP
Vectors 5 2 Part 1
 
PPTX
Linear Combination In Color Model
PPTX
Network analysis lecture
PDF
Statistics Homework Help
PPTX
Walmart Supply Chain Management 1 April 2014
PPTX
Walmart and RFID
PPT
Apple Supply Chain Mgmt
PPTX
Apple supply chain analysis
Vectors 5 2 Part 1
 
Linear Combination In Color Model
Network analysis lecture
Statistics Homework Help
Walmart Supply Chain Management 1 April 2014
Walmart and RFID
Apple Supply Chain Mgmt
Apple supply chain analysis

Similar to Analysis Of Algorithm (20)

PPTX
Optimizing search engines
PPT
Search engine page rank demystification
PPT
Seo and page rank algorithm
PDF
PageRank Algorithm
PPTX
PageRank
PPT
Pagerank
PPTX
Search engine
PPTX
PageRank Algorithm In data mining
DOC
PageRank & Searching
PPTX
Dm page rank
PPTX
Ranking algorithms
PDF
IRJET- Page Ranking Algorithms – A Comparison
PPT
Ranking Web Pages
PPTX
Page rank method
PPTX
google pagerank algorithms cosc 4335 stnaford
PPTX
Graph Mining_Module-3_CS7 (PageRank).pptx
PDF
J046045558
PPT
Comparative study of different ranking algorithms adopted by search engine
PPTX
How Google Works
PDF
Random web surfer pagerank algorithm
Optimizing search engines
Search engine page rank demystification
Seo and page rank algorithm
PageRank Algorithm
PageRank
Pagerank
Search engine
PageRank Algorithm In data mining
PageRank & Searching
Dm page rank
Ranking algorithms
IRJET- Page Ranking Algorithms – A Comparison
Ranking Web Pages
Page rank method
google pagerank algorithms cosc 4335 stnaford
Graph Mining_Module-3_CS7 (PageRank).pptx
J046045558
Comparative study of different ranking algorithms adopted by search engine
How Google Works
Random web surfer pagerank algorithm
Ad

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Pharma ospi slides which help in ospi learning
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Cell Structure & Organelles in detailed.
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Classroom Observation Tools for Teachers
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Final Presentation General Medicine 03-08-2024.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Supply Chain Operations Speaking Notes -ICLT Program
VCE English Exam - Section C Student Revision Booklet
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Pharma ospi slides which help in ospi learning
Module 4: Burden of Disease Tutorial Slides S2 2025
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
A systematic review of self-coping strategies used by university students to ...
Cell Structure & Organelles in detailed.
Chinmaya Tiranga quiz Grand Finale.pdf
Anesthesia in Laparoscopic Surgery in India
Classroom Observation Tools for Teachers
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Microbial disease of the cardiovascular and lymphatic systems
human mycosis Human fungal infections are called human mycosis..pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Final Presentation General Medicine 03-08-2024.pptx
Ad

Analysis Of Algorithm

  • 1. “ ” Topic : Search Engine Section 1 Group Members: 1. Umair Daud Raja(9380) 2.Abdul Basit(9675)
  • 2. What are Search Engines?  It is basically a type of program that uses keywords to search for documents that relate to these keywords and then puts the result found in the order of relevance to the topic that was searched for.  Examples: • Google • Bing • Ask  The information we want to find maybe a mix of Images, Videos, Web Pages and other type of files.  Moreover the pages that are displayed on the search are called as Search Engine Result Pages (SERPs).
  • 4. Which is the best search engine?  Google : 1,100,000,000 - Estimated Unique Monthly Visitors  Bing: 350,000,000 - Estimated Unique Monthly Visitors  Yahoo: 300,000,000 - Estimated Unique Monthly Visitors  Ask : 245,000,000 - Estimated Unique Monthly Visitors  AOL Search : 125,000,000 - Estimated Unique Monthly Visitors  Reference : http://www.ebizmba.com/articles/search-engines
  • 5. Market share  Reference : http://searchengineland.com/googles-search-market-share-67- percent-pc-83-percent-mobile-203937
  • 6. Search Engines: Algorithm  Previous: backrub; calculate on the basis of visits.  Algorithm: Page Rank  Impractical Solution; It was proposed by Larry Page  PageRank is the technique used by Google to determine importance of page on the web.  One of the most important factors that Google uses is PageRank. PageRank is a numeric value that represents how important a page is on the web.  Off course PageRank is not the only factor, which decides importance of page, but still it is one of them.  PageRank is described by one mathematical formula that seems very difficult at first, but actually it is not.
  • 7. Formula  We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor, which can be set between 0 and 1. We usually set d to 0.85.C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:  PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))  PR(A) is the PageRank of page A  PR(Ti) is the PageRank of pages Ti which link to page A  C(Ti) is the number of outbound links on page Ti  d is a damping factor which can be set between 0 and 1
  • 8. Damping factor  Damping factor  The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking.  The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.
  • 9. Explanation Consider an imaginary web of 3 web pages. And the inbound and outbound link structure is as shown in the figure. The calculations can be done by following method : PR(A) = 0.15 + 0.85 PR(C) = 0.15 + (0.85*1) = 1 PR(B) = 0.15 + 0.85 (PR(A) / 2) = 0.15 + 0.85 (1/2) = 0.15 + (0.85 * 0.5) = 0.15 + 0.425 = 0.575 PR(C) = 0.15 + 0.85 ((PR(A) / 2 )+ PR (B)) = 0.15 + 0.85 (1/2 + 0.575) = 0.15 + 0.85 (1.075) = 0.15 + 0.913 = 1.06 Where d=0.85, as according to formula, 1-d=1-0.85=0.15
  • 10. complexity  O( n+m)  Because even in a complete graph, one has to touch each edge twice So the complexity is O(n+m)
  • 11. Techniques  We will discuss two techniques here  Dynamical systems point of view:  Linear algebra point of view:
  • 13. Example(Cont.)  We "translate" the picture into a directed graph with 4 nodes, one for each web site.
  • 14. Example(cont.)  In the picture below, node 1 has 3 outgoing links.  Node 2 has 2 outgoing links.  Node 3 has 1 outgoing link.  Node 4 has 2 outgoing links.  In general, if there are k outgoing nodes then it will pass on 1/k.
  • 15. Example(Cont.)  After the diagram the Values of A, i.e. Transition matric becomes.
  • 16. 1. Dynamic System point of view  Suppose that initially the importance is uniformly distributed among the 4 nodes, each getting ¼. Denote by v the initial rank vector, having all entries equal to ¼. Each incoming link increases the importance of a web page, so at step 1, we update the rank of each page by adding to the current value the importance of the incoming links. This is the same as multiplying the matrix A with v . At step 1, the new importance vector is v1 = Av. We can iterate the process, thus at step 2, the updated importance vector is v2 = A(Av) = A2v. Numeric computations give:
  • 17. Dynamic System point of view(Cont.)
  • 18. 2. Linear algebra point of view:  Let us denote by x1, x2, x3, and x4 the importance of the four pages. Analyzing the situation at each node we get the system:
  • 19. Linear algebra point of view(Cont.)  Then we eigenvalues, and formula is det(A- I4 ) =0 and AX=0.. In this scenarios X=
  • 20. Linear algebra point of view(Cont.)  After calculation eigenvalues we get eigenvalues = and then we add eigenvalues we get 31, that is multiplied by these values to find Pagrank. =
  • 21. Current Algorithm used by Google  The latest algorithm used by google is “HummingBird”.  Google started using Hummingbird about 30 August 2013, and announced the change on September 26 on the eve of the company's 15th anniversary.
  • 22. Pros and cons of PageRank Pros:  It is query independent  Most relevant search results. Cons:  The major disadvantage of PageRank is that it favors the older pages, because a new page, even a very good one will not have as many links as the old one.  Search results are based on literals(keywords,) things but not on meaning.