PageRank algorithm (Dead ends, Spider Traps problem)

PageRank

  • 1 Basic concepts
  • 2 Algorithm basic explanation
  • 3 Dead Ends Questions
  • 4 Spider Traps Questions
  • 5 Algorithm advantages and disadvantages
  • 6 Can using β solve the Dead Ends problem?

 

 

1. Basic concepts

1.1 Background introduction

1.1.1 proposed by google

1.1.2 If the PageRank value is high, it will appear at the top when searching for content

1.1.3 PageRank is called PR. It ranks web pages and calculates the importance of the website. The PR value represents its importance factor

2.1 Algorithm central idea

2.1.1 Quantity assumption: In the web page model diagram, the more inbound links a web page receives from other web pages, the more important the web page is, and the PR value is represented by the size in the diagram.

2.1.2 Quality assumption: When a high-quality webpage points to another webpage, the pointed webpage is also important

2.1.3 In-links and out-links, literally

 

2. Algorithm basic explanation

2.1 PageRank formula

              When i=0, the initial PR value is 1n , where n is the total number of web pages, that is, each web page is equally important.

   Example of calculating PR value:

      Among them, ABCD are web pages pointing to each other. The initial PR value is 1/4. Use the PageRank formula to update the PR value. (Because the PR value is only updated once and is unstable, it needs to be updated multiple times)

 

Calculate PR(A)

Calculate PR(B)

By analogy, the PR value of all web pages is obtained in the first iteration, and sorted according to the PR value, the results are as follows.

Convert PR value to matrix (easy to calculate, and update PR value)

Among them, M is the matrix expression of the current PR value, and V is the PR value obtained last time.


After continuous iteration according to PR = M*V, the column vector generated after multiple iterations is the final PR value of the web page.

Second iteration result:

As can be seen from the figure below, PR = M*V, and the next PR value can be obtained

 

3. Dead Ends problem

If A points to B, and B does not point to any webpage, then the PR value of B will become 0

It can be found that in the iterative process, the PR will gradually become 0. This problem is called Dead Ends problem.

Use PR = M*V verification, get the same result, in the second iteration, the PR value becomes 0

 

3.1. Use Teleport to solve Dead Ends problem

   Because there is a column in the M matrix that is all 0 during the iterative process, when M*V, a Dead Ends problem occurs. Teleport sets a (1xN dimension). If the i-th column in M ​​is 0, ai=[1, 1,...,1], in other states, a is all 0.

 

3.2 Using Teleport to solve the Dead Ends problem example

   Through Teleport, add 1n to the columns in the M matrix that are all 0 , to solve the problem that the nodes in the network only contain incoming links but not outgoing links.

 

 

4. Spider Traps problem

Examples of Spider Traps:
  

When the network exists and only self-points to itself, during the update process of the PR value, the PR value of the self-pointing node will gradually return to 1, and the other nodes will return to 0. This is the Spider Traps problem.

4.1, Spider Traps problem solving method, Random Teleport

Step 1: Column transition probability matrix: that is, the probability that B goes out of the chain and points to other nodes. Let the column transition probability matrix be the M matrix. example

   Step 2:

   In order to make the problem that there is only one row of 1 in a certain column in the M matrix, it is added to the average probability of other nodes with the probability of 1-β. example:

I think it is to solve the problem that there is only one 1 in a column in M ​​without affecting the PR value ratio of other nodes as much as possible.

Through the revised M matrix, through PR = M*V, it is solved that the PR value is biased towards the node with a ring.

4.2 Summary of Spider Traps problem correction formula

Final fix formula: Will solve Dead Ends and Spider Traps together

When solving Dead Ends and Spider Traps problems, the M matrix is ​​corrected.

 

 

5 Advantages and disadvantages of PageRank algorithm

6 Can using β solve the Dead Ends problem?

Therefore, only β cannot solve the Dead Ends problem

Guess you like

Origin blog.csdn.net/qq_41427834/article/details/110262036