The Beauty of Mathematics - Chapter 10 Personal Notes

Chapter 10 PageRank

Google's democratic voting website ranking technology

1 The principle of PageRank algorithm

Core idea: If a web page is linked by many other web pages, it means that it is generally recognized and trusted, then its ranking is high.

Links from different pages are treated differently, and links from pages with higher rankings are more reliable and given high weight.

Calculation of Page Rank:

pagerank = 0.001 + 0.01 + 0.02 + 0.05 = 0.081 

The weights of X1, X2, X3, and X4 depend on the website ranking, so what are these four rankings?

Brin turned the problem into a two-dimensional matrix multiplication problem and solved it iteratively. In practice, the problem of calculation amount is realized by the technique of sparse matrix calculation.

The genius of the page ranking algorithm is that it treats the entire Internet as a whole, which is in line with the system theory.

Today, the most useful information for determining search quality is user click data.

 

2 Further reading: the calculation method of PageRank

Suppose the vector B = (b1,b2,...,bn)T

Page rank for the first, second, ... Nth pages. matrix

is the number of links between web pages. A is known, B is unknown (required)

Assuming Bi is the result of the ith iteration, then Bi = A*Bi-1

Initial assumption: all pages are ranked 1/N, i.e.

Obviously B1,B2,... can be simply calculated by Bi = A*Bi-1. It can be proved that Bi will eventually converge, that is, Bi approaches B infinitely, and B = B*A at this time. Generally, about 10 iterations basically converge.

Since the number of links between web pages is very sparse compared to the size of the Internet, calculating the page ranking of a web page also requires smoothing of zero-probability or small-probability events. The ranking of a web page is a one-dimensional vector, and it can only be smoothed using a small constant α. At this time:

where N is the number of Internet pages, α is a small constant, and I is the identity matrix.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324885091&siteId=291194637