SlideShare a Scribd company logo
Page Rank ImplementationCLOUD  COMPUTING  PROJECT-Team 3By:- Devendra Singh Parmar
Project AbstractInstructor:  Prof. Reddy RajaMentor:       Ms M.PadminiTo Implement PageRank Algorithm using Map-Reduce for Wikipedia and verify it for smaller data-sets
AgendaMotivation
 Introduction  to Algorithm
 PageRank Equation Analysis
 Brief Description of Project
 Module1
 Module2
 Module3
 Applications Motivation-> Need for PageRank: The Search engines store billions of web pages which overall contain  trillions of web url links. So, there is a need for an algorithm that gives  the most relevant pages  specific to a query.-> Need for Distributed  Environment( Map-Reduce  and  Distributed Storage)   Trillions of links implies huge data storage required.    (if each url requires 0.5K, then we need over 400TB just to store URLs!)     Large data set implies large computationsThus, we handle above issues in our project by using a distributed cluster
AgendaMotivation
Introduction  to Algorithm
 PageRank Equation Analysis
 Brief Description of Project
 Module1
 Module2
 Module3
 Applications IntroductionPageRank  is a link analysis algorithm, named after Larry Page, used by the Google Internet search engine that assigns a numerical weighting to each element of a hyperlinkedset of documents, such as the Worldwide Web, with the purpose of "measuring" its relative importance within the setThe numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).
AlgorithmGoogle figures that when one page links to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more important the page must be. Also, the importance of the page that is casting the vote determines how important the vote itself is. Google calculates a page's importance from the votes cast for it. How important each vote is also taken into account when a page's PageRank is calculated.
AgendaMotivation
 Introduction  to Algorithm
PageRank Equation Analysis
 Brief Description of Project
 Module1
 Module2
 Module3
 Applications The PageRank EquationSimple Iterative AlgorithmFor kth iteration PageRank of ith page is given by:Here,
The PageRank Equation(Issues and Enhancement)Problems:  Rank Sinks or Dangling Pages
  CyclesSolution:
PageRank Equation(Enhancement)Solution for Cycles and If a random surfer gets boredHere ‘d ‘ is known as damping factor . It  represents the probability, at any step, that the person will continue surfing . The value of ‘d’ is typically kept 0.85
PageRank Equation (finally)
In other wordsIn a simpler way:- a page's PageRank = 0.15 /N+ 0.85 * (a "share" of the PageRank of every page that links to it) "share" = the linking page's PageRank divided by the number of outbound links on the page. And N=the number of documents in collectionThe equation of PageRank shows clearly how a page's PageRank is arrived at. But what isn't immediately obvious is that it can't work if the calculation is done just once.
PageRank Equation-as per the  published paper :“The Anatomy of a Large-Scale Hyper textual Web Search Engine”-Sergey Brin and Lawrence Page We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85.. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) ->Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.
IssuesIn the Original FormulaFormula given in the in Page and Brin's paper  does not supports the statement that "the sum of all PageRanks is one“Hence to support the statement the formula is modified as:	PR(A) = (1-d)/N + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))where N=the number of documents in collection
AgendaMotivation
 Introduction  to Algorithm
 PageRank Equation Analysis
Brief Description of Project
 Module1

More Related Content

PPT
Ranking Web Pages
PDF
Extracting Resources that Help Tell Events' Stories
PPTX
Implementing page rank algorithm using hadoop map reduce
PPT
Living in the Cloud: Hosting Data & Apps Using the Google Infrastructure
PPT
TERMINALFOUR t44u 2008 - Raewyn McKenna - Hidden Goodies in Site Manager
PDF
Enhancement in Weighted PageRank Algorithm Using VOL
PPT
Two way authentication
PDF
2 factor authentication beyond password : enforce advanced security with au...
Ranking Web Pages
Extracting Resources that Help Tell Events' Stories
Implementing page rank algorithm using hadoop map reduce
Living in the Cloud: Hosting Data & Apps Using the Google Infrastructure
TERMINALFOUR t44u 2008 - Raewyn McKenna - Hidden Goodies in Site Manager
Enhancement in Weighted PageRank Algorithm Using VOL
Two way authentication
2 factor authentication beyond password : enforce advanced security with au...

Viewers also liked (20)

PDF
2 factor authentication 3 [compatibility mode]
PPTX
Underwater wireless sensor networks
PPTX
Bio-inspired Artificial Intelligence for Collective Systems
PPS
Bio Inspired Computing Final Version
PPTX
Wi-Vi Technology
PPTX
Wi vi- wifi that see through walls...
PPTX
Wi-Vi Technology
PPTX
Seo (Search Engine Optimization)
PPTX
Barcode In Retail Presentation
PPT
Cloud Computing Integration Introduction
PPTX
Wi vi ppt
PPSX
Securing underwater wireless communication by Nisha Menon K
PPT
latest seminar topics in computer science
PPTX
Cloud Computing by AGDMOUN Khalid
PPTX
Yubikey Neo
PPT
Rfid technologies
PPTX
Working of barcode reader Ppt - Unitedworld School of Business
PPTX
Barcode technology
PPT
Localization scheme for underwater wsn
PPT
Plagiarism Ppt Teachers
2 factor authentication 3 [compatibility mode]
Underwater wireless sensor networks
Bio-inspired Artificial Intelligence for Collective Systems
Bio Inspired Computing Final Version
Wi-Vi Technology
Wi vi- wifi that see through walls...
Wi-Vi Technology
Seo (Search Engine Optimization)
Barcode In Retail Presentation
Cloud Computing Integration Introduction
Wi vi ppt
Securing underwater wireless communication by Nisha Menon K
latest seminar topics in computer science
Cloud Computing by AGDMOUN Khalid
Yubikey Neo
Rfid technologies
Working of barcode reader Ppt - Unitedworld School of Business
Barcode technology
Localization scheme for underwater wsn
Plagiarism Ppt Teachers
Ad

Similar to Cloud Computing Project (20)

PDF
Pagerank is a good thing
DOC
PageRank & Searching
PPT
Pagerank(2)
PPT
Pagerank (1)
PPT
PPT
PPT
Pagerank
PPT
Pagerank
PPT
Introduccion a las Finanzas
PPT
Pagerank
PPT
Pagerank Di
PPT
Pagerank(2)
PPT
Pagerank
PPT
Pagerank
PPT
Pagerank
PPT
Pagerank
PPT
Pagerank
PPT
Pagerank
PPT
Pagerank
PPT
Pagerank
Pagerank is a good thing
PageRank & Searching
Pagerank(2)
Pagerank (1)
Pagerank
Pagerank
Introduccion a las Finanzas
Pagerank
Pagerank Di
Pagerank(2)
Pagerank
Pagerank
Pagerank
Pagerank
Pagerank
Pagerank
Pagerank
Pagerank
Ad

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
cuic standard and advanced reporting.pdf
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf

Cloud Computing Project

  • 1. Page Rank ImplementationCLOUD COMPUTING PROJECT-Team 3By:- Devendra Singh Parmar
  • 2. Project AbstractInstructor: Prof. Reddy RajaMentor: Ms M.PadminiTo Implement PageRank Algorithm using Map-Reduce for Wikipedia and verify it for smaller data-sets
  • 4. Introduction to Algorithm
  • 10. Applications Motivation-> Need for PageRank: The Search engines store billions of web pages which overall contain trillions of web url links. So, there is a need for an algorithm that gives the most relevant pages specific to a query.-> Need for Distributed Environment( Map-Reduce and Distributed Storage) Trillions of links implies huge data storage required. (if each url requires 0.5K, then we need over 400TB just to store URLs!) Large data set implies large computationsThus, we handle above issues in our project by using a distributed cluster
  • 12. Introduction to Algorithm
  • 14. Brief Description of Project
  • 18. Applications IntroductionPageRank is a link analysis algorithm, named after Larry Page, used by the Google Internet search engine that assigns a numerical weighting to each element of a hyperlinkedset of documents, such as the Worldwide Web, with the purpose of "measuring" its relative importance within the setThe numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).
  • 19. AlgorithmGoogle figures that when one page links to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more important the page must be. Also, the importance of the page that is casting the vote determines how important the vote itself is. Google calculates a page's importance from the votes cast for it. How important each vote is also taken into account when a page's PageRank is calculated.
  • 21. Introduction to Algorithm
  • 23. Brief Description of Project
  • 27. Applications The PageRank EquationSimple Iterative AlgorithmFor kth iteration PageRank of ith page is given by:Here,
  • 28. The PageRank Equation(Issues and Enhancement)Problems: Rank Sinks or Dangling Pages
  • 30. PageRank Equation(Enhancement)Solution for Cycles and If a random surfer gets boredHere ‘d ‘ is known as damping factor . It represents the probability, at any step, that the person will continue surfing . The value of ‘d’ is typically kept 0.85
  • 32. In other wordsIn a simpler way:- a page's PageRank = 0.15 /N+ 0.85 * (a "share" of the PageRank of every page that links to it) "share" = the linking page's PageRank divided by the number of outbound links on the page. And N=the number of documents in collectionThe equation of PageRank shows clearly how a page's PageRank is arrived at. But what isn't immediately obvious is that it can't work if the calculation is done just once.
  • 33. PageRank Equation-as per the published paper :“The Anatomy of a Large-Scale Hyper textual Web Search Engine”-Sergey Brin and Lawrence Page We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85.. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) ->Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.
  • 34. IssuesIn the Original FormulaFormula given in the in Page and Brin's paper does not supports the statement that "the sum of all PageRanks is one“Hence to support the statement the formula is modified as: PR(A) = (1-d)/N + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))where N=the number of documents in collection
  • 36. Introduction to Algorithm
  • 42. Applications Brief Description of ProjectInput: Data Set containing multiple records where each record contains the Url of the Page(from Url) followed by the url of a page to which it is pointing to(ToUrl).Wiki_Votes.txtToUrlFromUrl
  • 43. Brief Description of Project(Contd.)Output:The output file consist of records containing the url of the page(from Url), the page rank value of the page(PRValue) and the list of urls to which the page points to(ToUrlList).FinalOutput.txtToUrlListfromUrlPRValue
  • 44. Brief Description of ProjectModulesWebGraphModule1: ConverterModule2: PageRank CalculatorModule3: Output AnalyzerConverterIterateuntil convergencePageRankCalculator...Search EngineOutput AnalyzerCreateIndex
  • 46. Introduction to Algorithm
  • 48. Brief Description of Project
  • 52. Applications Module1: ConverterInput-OutputConverter (Initializing with PR= 1/N )FromUrlPRValue List:
  • 53. Module1: ConverterIssuesSelf Loops: -handled by checking the FromUrl with ToUrl before sending it to the reduce function Dangling Pages: -handled by initializing their PRValue with 1/N and the List of ToUrls is left blank.
  • 55. Introduction to Algorithm
  • 57. Brief Description of Project
  • 61. Applications Module2: PageRank CalculatorInput-OutputPageRank Calculator (User can give Precision)
  • 62. Module2: PageRank CalculatorMap:Input:index.html PRValueOutList: < 1.html 2.html... > Output 1. Output for each outlink:key: “1.html”value: PRValue/ ListLength (Vote Share) 2. ToUrl itself key: index.htmlvalue: <OutList>ReduceInput: Key: “1.html”Value: 0.5 23Value: 0.24 2…….Value : UrlList <OutLink>Output:Key: “1.html”Value: “<new pagerank> <OutList> 1.html 2.html...”Start with the initial PageRank and Outlinksof a document.
  • 63. Module2: PageRank CalculatorMap:Input:index.html PRValueOutList: < 1.html 2.html... > Output 1. Output for each outlink:key: “1.html”value: PRValue/ ListLength (Vote Share) 2. ToUrl itself key: index.htmlvalue: <OutList>ReduceInput: Key: “1.html”Value: 0.5 23Value: 0.24 2…….Value : UrlList <OutLink>Output:Key: “1.html”Value: “<new pagerank> <OutList> 1.html 2.html...”For each Outlink, output the PageRank’s share of the Inlinks, and List of outlinks.
  • 64. Module2: PageRank CalculatorMap:Input:index.html PRValueOutList: < 1.html 2.html... > Output 1. Output for each outlink:key: “1.html”value: PRValue/ ListLength (Vote Share) 2. ToUrl itself key: index.htmlvalue: <OutList>ReduceInput: Key: “1.html”Value: 0.5 23Value: 0.24 2…….Value : UrlList <OutLink>Output:Key: “1.html”Value: “<new pagerank> <OutList> 1.html 2.html...”Now the reducer has a Url of document, all the inlinks to that document and their corresponding PageRank’s share and List of outlinks.
  • 65. Module2: PageRank CalculatorMap:Input:index.html PRValueOutList: < 1.html 2.html... > Output 1. Output for each outlink:key: “1.html”value: PRValue/ ListLength (Vote Share) 2. ToUrl itself key: index.htmlvalue: <OutList>ReduceInput: Key: “1.html”Value: 0.5 23Value: 0.24 2…….Value : UrlList <OutLink>Output:Key: “1.html”Value: “<new pagerank> <OutList> 1.html 2.html...”Compute the new PageRank and output in the same format as the input.
  • 66. Module2: PageRank CalculatorMap:Input:index.html PRValueOutList: < 1.html 2.html... > Output 1. Output for each outlink:key: “1.html”value: PRValue/ ListLength (Vote Share) 2. ToUrl itself key: index.htmlvalue: <OutList>ReduceInput: Key: “1.html”Value: 0.5 23Value: 0.24 2…….Value : UrlList <OutLink>Output:Key: “1.html”Value: “<new pagerank> <OutList> 1.html 2.html...”Now iterate until convergence (determined by the precision value).
  • 67. Module2: PageRank Calculator IssuesCatch22 SituationSuppose we have 2 pages, A and B, which link to each other, and neither have any other links of any kind. This is what happens:- Step 1: Calculate page A's PageRank from the value of its inbound linksStep 2: Calculate page B's PageRank from the value of its inbound links we can't work out A's PageRank until we know B's PageRank, and we can't work out B's PageRank until we know A's PageRank. Thus the PageRank of A and B will be inaccurate.
  • 68. Module2: PageRank Calculator IssuesCatch22 situation (solution)This problem is overcome by repeating the calculations many times. Each time produces slightly more accurate values. In fact, total accuracy can never be achieved because the calculations are always based on inaccurate values.The number of iterations should be sufficient to reach a point where any further iterations wouldn't produce enough of a change to the values to matter.=> Use “delta function” which will keep track of changes in the PageRank of all the pages and if the change in PageRank of all the pages is less than the value specified by the user the iterations can be stopped.
  • 70. Introduction to Algorithm
  • 72. Brief Description of Project
  • 76. Applications Module 3: Output AnalyzerInput-OutputInputAnalyzer ( If user want Top 3)Output
  • 78. Introduction to Algorithm
  • 80. Brief Description of Project
  • 85. QuestionsApplications and ExtensionsA simple model of Search Engine. (Implemented) The application utilizes: The PageRank calculated by the PageRank CalculatorThe output generated by a map-reduce module that finds out the number of times a pattern (as per the user’s query) matches in each of the files present in data set.And outputs: The list of pages which are relevant to the query made in the order of their importance.(DEMO)
  • 86. Applications and ExtensionsOther Applications:PageRank-based mechanism to rank knowledge items used in E-Learning.
  • 87. GeneRank (based on PageRank) ranks the genes analyzed in the microarray to see the relationship between the cell’s function and gene expression.
  • 88. Can be used to sort the items present in the side menu in various blogs and sites depending on their importance.Referenceshttp://infolab.stanford.edu/pub/papers/google.pdf ( research paper by Brin and Page)http://www.ams.org/featurecolumn/archive/pagerank.htmlhttp://en.wikipedia.org/wiki/PageRankhttp://www.webworkshop.net/pagerank.html#how_is_pagerank_calculated