SlideShare a Scribd company logo
A HADOOP
IMPLEMENTATION
OF PAGERANK
C H E N G E N G M A
2 0 1 6 / 0 2 / 0 2
HOW DOES GOOGLE FIGHT WITH
SPAMMERS ?
• The old version search engine usually
relies on the information (e.g.,word
frequency) shown on each page itself.
• A spammer who want to sell hisT-shirt
may create his own web page which has
words like“movie” 1000 times.But he can
make these words invisible by setting the
same color as the background.
• When you search“movie”,the old search
engine will find this page unbelievably
important,so you click it and only find his
ad forT-shirt.
• “While Google was not the first search
engine,it was the first able to defeat the
spammers who had made search almost
useless. ”
• The key innovation that Google has
introduced is a measurement of web page
importance,called PageRank.
PAGERANK IS ABOUT WEB LINKS.
WHY WEB LINKS?
• People usually like to add a tag or a link to
a page he/she thinks is correct,useful or
reliable.
• For spammers,they can create their own
page as whatever they like,but it’s usually
hard for them to ask other pages to link
to them.
• Even though he can create a link farm
where thousands of pages link to one
particular page which he want to emphasis,
that thousands of pages he has control are
still not linked by billions of web pages in
the out side of world.
For example,a
Chinese web user
who see the left
picture on site will
probably add a tag as
“MilkTea Beauty”
(A Chinese young
celebrity whose
reputation is disputed).
WHAT IS PAGERANK?• PageRank is a vector whose j th element is
the probability that a random surfer is
travelling at the j th web page at the final
static state.
• At the beginning,you can set each page
onto the same value ( Vj=1/N ). Then you
multiply the PageRank vectorV with
transition matrix M to get the next
moment’s probability distribution X.
• In the final state,PageRank will converge
and vector X will be the same as vectorV.
For web that does not contains dead end
or spider trap,vectorV now represents
the PageRank.
A B C D
A
B
C
D
J: from
I: to
SPIDER TRAP
• Once you come to page C, you have no
way to leave C. The random surfer get
trapped at page C, so that everything
becomes not random.
• Finally all the PageRank will be taken by
page C.
DEAD END • In the real situation,a page can be a
dead end (does not link to any other
pages).Once the random surfer
comes to a dead end,it stops
travelling and has no more chance to
go out to other pages,so the
random assumption is violated.
• The column correspond to it in
transition matrix will be an empty
column,for the previous definition.
• Keeping on multiplying this matrix
will leave nothing left.
TAXATION
For loop iterations:
𝑉1 = 𝜌 ∗ 𝑀 ∗ 𝑉0
𝑉1 = 𝑉1 + (1 − 𝑠𝑢𝑚(𝑉1))/𝑁
𝑉0 = 𝑉1
The modified version algorithm:
• The modification to solve the above 2
problems is adding a possibility 𝜌 that the
surfer will keep on going through the
links, so there is (1 − 𝜌) possibility the
surfer will teleport to random pages.
• This method is called taxation.
HOWEVER, THE REAL WEB HAS BILLIONS
OF PAGES, MULTIPLICATION BETWEEN
MATRIX AND VECTOR IS OVERHEAD.
• By using partitioned matrix and vector,the
calculation can be paralleled onto a
computing cluster that has more than
thousands of nodes.
• And such large a magnitude of computing
is usually managed by a mapreduce system,
like Hadoop.
beta=0 1 2 3 4
alpha=0
1
2
3
4
beta=
0
1
2
3
4
MAPREDUCE
• 1st mapper:
• 𝑀 𝑖, 𝑗, 𝑀𝑖𝑗 → { 𝛼, 𝛽 ; "M", 𝑖, 𝑗, 𝑀𝑖𝑗 }
where 𝛼 = 𝑖/∆, 𝛽 = 𝑗/∆, where the
∆ represents interval.
𝑉 𝑗, 𝑉𝑗 → { 𝛼, 𝛽 ; ("𝑉", 𝑗, 𝑉𝑗)}
where ∀𝛼 ∈ [0, 𝐺 − 1], 𝛽 = 𝑗/∆,
𝐺 = 𝑐𝑒𝑖𝑙(
𝑁
∆
) represents the group number.
• 1st reducer gets input as:
{ (𝛼, 𝛽); [ "M", 𝑖, 𝑗, 𝑀𝑖𝑗 , ("𝑉", 𝑗, 𝑉𝑗) ] }
∀ 𝑖 ∈ partion 𝛼
∀ 𝑗 ∈ partion 𝛽
• 1st reducer outputs:
{ 𝑖 ; 𝑆 𝛽 = ∀ 𝑗 ∈ partion 𝛽 𝑀𝑖𝑗 ∗ 𝑉𝑗 }
• 2nd mapper: Pass
• 2nd reducer gets input as:
{ 𝑖; [𝑆0, 𝑆1, 𝑆2,… , 𝑆 𝐺−1] }
• 2nd reducer outputs { 𝑖 ; 𝛽=0
𝐺−1
𝑆 𝛽 }
BEFORE THE PAGERANK CALCULATING
TRANSLATING THE WEB TO NUMBERS
• 𝐴 → 𝐵
• 𝐴 → 𝐶
• 𝐴 → 𝐷
• 𝐵 → 𝐴
• 𝐵 → 𝐷
• 𝐶 → 𝐴
• 𝐷 → 𝐵
• 𝐷 → 𝐶
• A 0
• B 1
• C 2
• D 3
LINKS ID
• Performing Inner Join twice, where
the 1st time’s key is FromNodeID,the
2nd time’s key isToNodeID.
• 𝐴, 𝐵, 0
• 𝐴, 𝐶, 0
• 𝐴, 𝐷, 0
• 𝐵, 𝐴, 1
• 𝐵, 𝐷, 1
• 𝐶, 𝐴, 2
• 𝐷, 𝐵, 3
• 𝐷, 𝐶, 3
• 𝐴, 𝐵, 0, 1
• 𝐴, 𝐶, 0, 2
• 𝐴, 𝐷, 0, 3
• 𝐵, 𝐴, 1, 0
• 𝐵, 𝐷, 1, 3
• 𝐶, 𝐴, 2, 0
• 𝐷, 𝐵, 3, 1
• 𝐷, 𝐶, 3, 2
After 1st
inner join
After 2nd
inner join
After the PageRank is
calculated,the same thing can
be done to translate index
back to node names.
From
Node
ID
To Node
ID
Web
Node ID
in data
Index used in
program
2002 GOOGLE PROGRAMMING
CONTEST WEB GRAPH DATA
• 875713 pages, 5105039 edges
• 72 MB txt file
• Hadoop program iterates 75 times (“For
the web itself, 50-75 iterations are
sufficient to converge to within the error
limits of double precision”).
• 𝜌 = 0.85 as the possibility to follow the
web links and 0.15 possibility to teleport.
• The program has a structure of for loop,
each of which has 4 map-reduce job inside.
• The first 2 MR job are for matrix
multiplying vector.
• The 3rd MR job is to calculate the sum of
the product vector beta*M*V.
• And the final MR job does the shifting.
PAGERANK RESULT
• A Python program is written to compare the result from Hadoop:
RESULT ANALYSIS
• The value not sorted is noisy
and hard to see.
• But sorting by PageRank value and plotting in
log-log provides a linear line.
RESULT ANALYSIS
• The histogram has exponentially decaying
counts for large PageRankvalue.
• The largest 1/9 web pages contains 60% of
PageRank importance over the whole dataset.
REFERENCE
• Mining of Massive Datasets, Chapter 5
Jure Leskovec, Anand Rajaraman and Jeffrey D. Ullman
The code will be attached as the following files.
FINALLY, A TOP K PROGRAM IN HADOOP
• 1st column is the index used in this
program;
• 2nd column is the web node ID
within the original data;
• 3rd column is the PageRank value.
The right table shows the top 15
PageRank value.

More Related Content

PPTX
Introduction to matplotlib
PDF
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
PDF
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
PDF
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
PPTX
Python presentation by Monu Sharma
PDF
Object oriented approach in python programming
PPTX
R programming
Introduction to matplotlib
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Python presentation by Monu Sharma
Object oriented approach in python programming
R programming

What's hot (20)

PPTX
Data analytics with R
PDF
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
PDF
Python Programming Tutorial | Edureka
PPTX
Exploratory data analysis with Python
PDF
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
PDF
Data Visualization in Python
ODP
NAIVE BAYES CLASSIFIER
PPTX
Ai lecture 13(unit03)
PPTX
Exploratory data analysis in R - Data Science Club
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPTX
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
PPTX
Machine learning session4(linear regression)
PDF
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
PDF
Python - the basics
PPTX
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
PPTX
Application of Clustering in Data Science using Real-life Examples
PPTX
200109-Open AI Chat GPT-4-3.pptx
PDF
Introduction to data mining and machine learning
PDF
Logistic regression
PPTX
Tsp branch and-bound
Data analytics with R
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
Python Programming Tutorial | Edureka
Exploratory data analysis with Python
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
Data Visualization in Python
NAIVE BAYES CLASSIFIER
Ai lecture 13(unit03)
Exploratory data analysis in R - Data Science Club
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Machine learning session4(linear regression)
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Python - the basics
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Application of Clustering in Data Science using Real-life Examples
200109-Open AI Chat GPT-4-3.pptx
Introduction to data mining and machine learning
Logistic regression
Tsp branch and-bound
Ad

Viewers also liked (18)

PDF
Hadoop implementation for algorithms apriori, pcy, son
PDF
PDF
Hadoop Futures
PDF
Performance monitoring and call tracing in microservice environments
PDF
Google PageRank
PPTX
Streaming Python on Hadoop
PDF
Mapreduce Algorithms
PPTX
Implementing the Lambda Architecture efficiently with Apache Spark
PDF
Large Scale Data Analysis with Map/Reduce, part I
PPTX
Pig, Making Hadoop Easy
PDF
introduction to data processing using Hadoop and Pig
PDF
Practical Problem Solving with Apache Hadoop & Pig
PDF
Big Data and Fast Data - Lambda Architecture in Action
KEY
Hadoop, Pig, and Twitter (NoSQL East 2009)
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
PDF
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
PPT
Pagerank Algorithm Explained
PPT
Introduction To Map Reduce
Hadoop implementation for algorithms apriori, pcy, son
Hadoop Futures
Performance monitoring and call tracing in microservice environments
Google PageRank
Streaming Python on Hadoop
Mapreduce Algorithms
Implementing the Lambda Architecture efficiently with Apache Spark
Large Scale Data Analysis with Map/Reduce, part I
Pig, Making Hadoop Easy
introduction to data processing using Hadoop and Pig
Practical Problem Solving with Apache Hadoop & Pig
Big Data and Fast Data - Lambda Architecture in Action
Hadoop, Pig, and Twitter (NoSQL East 2009)
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Pagerank Algorithm Explained
Introduction To Map Reduce
Ad

Similar to A hadoop implementation of pagerank (20)

PDF
Link Analysis " Page Ranke Tobic " by waleed
PPTX
Implementing page rank algorithm using hadoop map reduce
PDF
Exploring optimizations for dynamic pagerank algorithm based on CUDA : V3
PDF
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
PPT
Pagerank (from Google)
PDF
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
PDF
Incremental Page Rank Computation on Evolving Graphs : NOTES
PPT
Lec5 Pagerank
PPT
Lec5 pagerank
PPT
Lec5 Pagerank
PPT
MapReduceAlgorithms.ppt
PDF
A Generalization of the PageRank Algorithm : NOTES
PDF
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
PDF
Random web surfer pagerank algorithm
PDF
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
PDF
Incremental Page Rank Computation on Evolving Graphs
PDF
Processing large-scale graphs with Google(TM) Pregel
PDF
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
PDF
PageRank_algorithm_Nfaoui_El_Habib
DOC
Done reread deeperinsidepagerank
Link Analysis " Page Ranke Tobic " by waleed
Implementing page rank algorithm using hadoop map reduce
Exploring optimizations for dynamic pagerank algorithm based on CUDA : V3
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Pagerank (from Google)
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
Incremental Page Rank Computation on Evolving Graphs : NOTES
Lec5 Pagerank
Lec5 pagerank
Lec5 Pagerank
MapReduceAlgorithms.ppt
A Generalization of the PageRank Algorithm : NOTES
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Random web surfer pagerank algorithm
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Incremental Page Rank Computation on Evolving Graphs
Processing large-scale graphs with Google(TM) Pregel
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
PageRank_algorithm_Nfaoui_El_Habib
Done reread deeperinsidepagerank

Recently uploaded (20)

PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Introduction to the R Programming Language
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Computer network topology notes for revision
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to machine learning and Linear Models
PPT
Quality review (1)_presentation of this 21
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
STERILIZATION AND DISINFECTION-1.ppthhhbx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to the R Programming Language
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Knowledge Engineering Part 1
climate analysis of Dhaka ,Banglades.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
[EN] Industrial Machine Downtime Prediction
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Database Infoormation System (DBIS).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Computer network topology notes for revision
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
Introduction to machine learning and Linear Models
Quality review (1)_presentation of this 21

A hadoop implementation of pagerank

  • 1. A HADOOP IMPLEMENTATION OF PAGERANK C H E N G E N G M A 2 0 1 6 / 0 2 / 0 2
  • 2. HOW DOES GOOGLE FIGHT WITH SPAMMERS ? • The old version search engine usually relies on the information (e.g.,word frequency) shown on each page itself. • A spammer who want to sell hisT-shirt may create his own web page which has words like“movie” 1000 times.But he can make these words invisible by setting the same color as the background. • When you search“movie”,the old search engine will find this page unbelievably important,so you click it and only find his ad forT-shirt. • “While Google was not the first search engine,it was the first able to defeat the spammers who had made search almost useless. ” • The key innovation that Google has introduced is a measurement of web page importance,called PageRank.
  • 3. PAGERANK IS ABOUT WEB LINKS. WHY WEB LINKS? • People usually like to add a tag or a link to a page he/she thinks is correct,useful or reliable. • For spammers,they can create their own page as whatever they like,but it’s usually hard for them to ask other pages to link to them. • Even though he can create a link farm where thousands of pages link to one particular page which he want to emphasis, that thousands of pages he has control are still not linked by billions of web pages in the out side of world. For example,a Chinese web user who see the left picture on site will probably add a tag as “MilkTea Beauty” (A Chinese young celebrity whose reputation is disputed).
  • 4. WHAT IS PAGERANK?• PageRank is a vector whose j th element is the probability that a random surfer is travelling at the j th web page at the final static state. • At the beginning,you can set each page onto the same value ( Vj=1/N ). Then you multiply the PageRank vectorV with transition matrix M to get the next moment’s probability distribution X. • In the final state,PageRank will converge and vector X will be the same as vectorV. For web that does not contains dead end or spider trap,vectorV now represents the PageRank. A B C D A B C D J: from I: to
  • 5. SPIDER TRAP • Once you come to page C, you have no way to leave C. The random surfer get trapped at page C, so that everything becomes not random. • Finally all the PageRank will be taken by page C.
  • 6. DEAD END • In the real situation,a page can be a dead end (does not link to any other pages).Once the random surfer comes to a dead end,it stops travelling and has no more chance to go out to other pages,so the random assumption is violated. • The column correspond to it in transition matrix will be an empty column,for the previous definition. • Keeping on multiplying this matrix will leave nothing left.
  • 7. TAXATION For loop iterations: 𝑉1 = 𝜌 ∗ 𝑀 ∗ 𝑉0 𝑉1 = 𝑉1 + (1 − 𝑠𝑢𝑚(𝑉1))/𝑁 𝑉0 = 𝑉1 The modified version algorithm: • The modification to solve the above 2 problems is adding a possibility 𝜌 that the surfer will keep on going through the links, so there is (1 − 𝜌) possibility the surfer will teleport to random pages. • This method is called taxation.
  • 8. HOWEVER, THE REAL WEB HAS BILLIONS OF PAGES, MULTIPLICATION BETWEEN MATRIX AND VECTOR IS OVERHEAD. • By using partitioned matrix and vector,the calculation can be paralleled onto a computing cluster that has more than thousands of nodes. • And such large a magnitude of computing is usually managed by a mapreduce system, like Hadoop. beta=0 1 2 3 4 alpha=0 1 2 3 4 beta= 0 1 2 3 4
  • 9. MAPREDUCE • 1st mapper: • 𝑀 𝑖, 𝑗, 𝑀𝑖𝑗 → { 𝛼, 𝛽 ; "M", 𝑖, 𝑗, 𝑀𝑖𝑗 } where 𝛼 = 𝑖/∆, 𝛽 = 𝑗/∆, where the ∆ represents interval. 𝑉 𝑗, 𝑉𝑗 → { 𝛼, 𝛽 ; ("𝑉", 𝑗, 𝑉𝑗)} where ∀𝛼 ∈ [0, 𝐺 − 1], 𝛽 = 𝑗/∆, 𝐺 = 𝑐𝑒𝑖𝑙( 𝑁 ∆ ) represents the group number. • 1st reducer gets input as: { (𝛼, 𝛽); [ "M", 𝑖, 𝑗, 𝑀𝑖𝑗 , ("𝑉", 𝑗, 𝑉𝑗) ] } ∀ 𝑖 ∈ partion 𝛼 ∀ 𝑗 ∈ partion 𝛽 • 1st reducer outputs: { 𝑖 ; 𝑆 𝛽 = ∀ 𝑗 ∈ partion 𝛽 𝑀𝑖𝑗 ∗ 𝑉𝑗 } • 2nd mapper: Pass • 2nd reducer gets input as: { 𝑖; [𝑆0, 𝑆1, 𝑆2,… , 𝑆 𝐺−1] } • 2nd reducer outputs { 𝑖 ; 𝛽=0 𝐺−1 𝑆 𝛽 }
  • 10. BEFORE THE PAGERANK CALCULATING TRANSLATING THE WEB TO NUMBERS • 𝐴 → 𝐵 • 𝐴 → 𝐶 • 𝐴 → 𝐷 • 𝐵 → 𝐴 • 𝐵 → 𝐷 • 𝐶 → 𝐴 • 𝐷 → 𝐵 • 𝐷 → 𝐶 • A 0 • B 1 • C 2 • D 3 LINKS ID • Performing Inner Join twice, where the 1st time’s key is FromNodeID,the 2nd time’s key isToNodeID. • 𝐴, 𝐵, 0 • 𝐴, 𝐶, 0 • 𝐴, 𝐷, 0 • 𝐵, 𝐴, 1 • 𝐵, 𝐷, 1 • 𝐶, 𝐴, 2 • 𝐷, 𝐵, 3 • 𝐷, 𝐶, 3 • 𝐴, 𝐵, 0, 1 • 𝐴, 𝐶, 0, 2 • 𝐴, 𝐷, 0, 3 • 𝐵, 𝐴, 1, 0 • 𝐵, 𝐷, 1, 3 • 𝐶, 𝐴, 2, 0 • 𝐷, 𝐵, 3, 1 • 𝐷, 𝐶, 3, 2 After 1st inner join After 2nd inner join After the PageRank is calculated,the same thing can be done to translate index back to node names. From Node ID To Node ID Web Node ID in data Index used in program
  • 11. 2002 GOOGLE PROGRAMMING CONTEST WEB GRAPH DATA • 875713 pages, 5105039 edges • 72 MB txt file • Hadoop program iterates 75 times (“For the web itself, 50-75 iterations are sufficient to converge to within the error limits of double precision”). • 𝜌 = 0.85 as the possibility to follow the web links and 0.15 possibility to teleport. • The program has a structure of for loop, each of which has 4 map-reduce job inside. • The first 2 MR job are for matrix multiplying vector. • The 3rd MR job is to calculate the sum of the product vector beta*M*V. • And the final MR job does the shifting.
  • 12. PAGERANK RESULT • A Python program is written to compare the result from Hadoop:
  • 13. RESULT ANALYSIS • The value not sorted is noisy and hard to see. • But sorting by PageRank value and plotting in log-log provides a linear line.
  • 14. RESULT ANALYSIS • The histogram has exponentially decaying counts for large PageRankvalue. • The largest 1/9 web pages contains 60% of PageRank importance over the whole dataset.
  • 15. REFERENCE • Mining of Massive Datasets, Chapter 5 Jure Leskovec, Anand Rajaraman and Jeffrey D. Ullman The code will be attached as the following files. FINALLY, A TOP K PROGRAM IN HADOOP • 1st column is the index used in this program; • 2nd column is the web node ID within the original data; • 3rd column is the PageRank value. The right table shows the top 15 PageRank value.