PageRank Algorithm Notes (Simplified Iterative Version)

PageRank Algorithm Notes (Simplified Iterative Version)

1 Introduction

Google's classic webpage ranking algorithm, the more times a certain webpage A is pointed to by others, the higher the importance of this webpage A.

"Page ranking is essentially an algorithm that roughly analyzes the importance of web pages with the number and quality of hyperlinks between web pages as the main factor. Its basic assumption is: more important pages are often more cited by other pages (Or more hyperlinks to this page will be added to other pages.) It interprets the link from page A to page B as "page A votes for page B", and according to the voting source (even the source of the source) , that is, the page that links to page A) and the level of the voter to determine the level of the voted page. Simply put, a high-level page can promote other low-level pages.”

2. Formula (iterative version)

For a webpage A (node), its PageRank PageRank under the current iterationP a g e Rank value (PR PRPR ) calculation formula is:
PR ( A ) = ∑ PR value of all other nodes pointing to A in the last iteration of nodes pointing to A The out-degree value of this node PR(A)=\sum_{All other nodes pointing to A }\frac{The PR value of the node pointing to A in the last iteration}{the out-degree value of the node}PR ( A )=All other nodes pointing to AOut-degree value of the nodeThe PR value of the node pointing to A in the last iteration

3. Examples

Calculate the PR value of each node in the figure below after two iterations

sample graph

1. Initialization

In the preliminary calculation, we can presume that the influence of the four nodes is the same, and record the total influence as 1, then:
PR ( A ) = PR ( B ) = PR ( C ) = PR ( D ) = 1 4 PR(A)=PR(B)=PR(C)=PR(D)=\frac{1}{4}PR ( A )=PR ( B )=PR ( C )=PR ( D )=41

2. First iteration

  • For node A, only node C points to it. For node C, its PR value in the last round is 1 4 \frac{1}{4}41, its out-degree value is 3 (pointing to A, B and D respectively). So in this round of iteration, the PR value of node A is:
    PR ( A ) = 1 4 3 = 1 12 PR(A)=\frac{\frac{1}{4}}{3}=\frac{1 }{12}PR ( A )=341=121

  • For node B, there are two nodes A and C pointing to it. For node A, its last round of PR value is 1 4 \frac{1}{4}41, its out-degree value is 2 (pointing to B and C); for node C, its last round PR value is 1 4 \frac{1}{4}41, which has an out-degree of 3 (points to A, B, and D). Then in this round of iteration, the PR value of node B is:
    PR ( B ) = 1 4 2 + 1 4 3 = 5 24 PR(B)=\frac{\frac{1}{4}}{2}+ \frac{\frac{1}{4}}{3}=\frac{5}{24}PR ( B )=241+341=245

  • Similarly, the PR values ​​of nodes C and D in this round are:
    PR ( C ) = 1 4 2 + 1 4 1 = 3 8 PR(C)=\frac{\frac{1}{4}}{2} +\frac{\frac{1}{4}}{1}=\frac{3}{8}PR ( C )=241+141=83
    P R ( D ) = 1 4 1 + 1 4 3 = 1 3 PR(D)=\frac{\frac{1}{4}}{1}+\frac{\frac{1}{4}}{3}=\frac{1}{3} PR ( D )=141+341=31

  • It can be noticed that the sum of the PR values ​​of the four nodes in this round is still 1 .

3. Second iteration

  • In the second iteration, for node A, only node C points to it. For node C, its PR value in the last round is 3 8 \frac{3}{8}83, its out-degree value is 3 (pointing to A, B and D respectively). So in this round of iteration, the PR value of node A is:
    PR ( A ) = 3 8 3 = 1 8 = 0.125 PR(A)=\frac{\frac{3}{8}}{3}=\frac {1}{8}=0.125PR ( A )=383=81=0.125

  • Similarly, for the remaining three nodes, their PR values ​​are:
    PR ( B ) = 1 12 2 + 3 8 3 = 1 6 ≈ 0.167 PR(B)=\frac{\frac{1}{12} }{2}+\frac{\frac{3}{8}}{3}=\frac{1}{6}\approx0.167PR ( B )=2121+383=610.167
    P R ( C ) = 1 12 2 + 1 3 1 = 3 8 = 0.375 PR(C)=\frac{\frac{1}{12}}{2}+\frac{\frac{1}{3}}{1}=\frac{3}{8}=0.375 PR ( C )=2121+131=83=0.375
    P R ( D ) = 5 24 1 + 3 8 3 = 1 3 ≈ 0.333 PR(D)=\frac{\frac{5}{24}}{1}+\frac{\frac{3}{8}}{3}=\frac{1}{3}\approx0.333 PR ( D )=1245+383=310.333

  • Similarly, add up the PR values ​​of the four nodes in this round, and their sum is still 1 .

4. Final Results and Analysis

node initialization first iteration second iteration Final PageRank value final ranking
A 1/4 1/12 1/8 0.125 4
B 1/4 5/24 1/6 0.167 3
C 1/4 3/8 3/8 0.375 1
D 1/4 1/3 1/3 0.333 2

The overall calculation results are shown in the table above. According to the results, the ranking of influence is:
C > D > B > A C>D>B>AC>D>B>A

4. Reference link

  1. PageRank-wiki
  2. PageRank of classic machine learning algorithm
  3. PageRank Algorithm - Example(YouTube)

Guess you like

Origin blog.csdn.net/neowell/article/details/129156383