PageRank principle and C language implementation

PageRank is a search engine ranking algorithm developed by Google co-founder Larry Page. The algorithm views the Internet as a directed graph, where web pages are represented as nodes and links (hyperlinks) are represented as edges.

The basic principle of PageRank is to give each page a "weight", which depends on the comprehensive evaluation of the number and quality of the webpage being connected by other webpages. Specifically, when there are many pages pointing to the same page, that page will be considered more important (more popular) and thus get higher weight.

When calculating the PageRank value, each page will be assigned an initial value (usually 1). Then, the PageRank value of each page is calculated multiple times using an iterative algorithm until convergence.

During the calculation, the PageRank value of each node will be collected from all inbound nodes associated with it (that is, nodes pointing to this node), and the PageRank values ​​of these inbound nodes will be divided according to the equal ratio of its adjacent edges Calculation. Ultimately, the PageRank value is considered as the relative weight of each node for search engine rankings.

In short, the PageRank algorithm mainly determines the relative importance of the page by evaluating the quantity and quality of the inbound links of the web page, and which pages these links point to, and performs search engine ranking accordingly.

Its formula implementation is as follows:

  F o r \ For  For   t = 0 : \ t = 0:  t=0:
P R ( p i ; t ) = 1 N P R\left(p_{i} ; t\right)=\frac{1}{N} PR(pi;t)=N1
  F o r \ For For   t > 0 : \ t > 0:  t>0:

P R ( p j ; t ) = 1 − d N + d × ( ( ∑ p j ∈ M ( p i ) P R ( p j ; t − 1 ) D ( p j ) ) + ( ∑ p j ∈ S P R ( p j ; t − 1 ) N ) ) P R\left(p_{j} ; t\right)=\frac{1-d}{N}+d \times\left(\left(\sum_{p _{j} \in M\left(p_{i}\right)} \frac{P R\left(p_{j}; t-1\right)}{D\left(p_{j}\right)}\right)+\left(\sum_{p_{j} \in S} \frac{P R\left(p_{j} ; t-1\right)}{N}\right)\right) PR(pj;t)=N1d+d× pjM(pi)D(pj)PR(pj;t1) + pjSNPR(pj;t1)

The C language implementation of the algorithm is as follows:

  • Structure definition:
//边表结点
typedef struct ArcNode{
    
    
	int adjvex;		//某条边指向的那个顶点的位置
	ArcNode * next;	//指向下一条弧的指针 
	weight w;		//权值
}ArcNode; 
//顶点表结点
typedef struct VNode{
    
    
	VertexType data;	//顶点信息
	double oldrank;
	double pagerank;
//	double sink_rank;
	ArcNode * first;	//指向第一条依附该顶点的弧的指针
}VNode;
typedef struct GraphRepr{
    
    
	VNode * node;		//邻接表
	int vexnum, arcnum;	//图的顶点数和弧数 
}Graph, *graph; 
  • Algorithm implementation:
void graph_pagerank(graph g, double damping, double delta) {
    
    
	double sink_rank = 0;
    int N = graph_vertices_count(g);
    for(int i = 0; i < N; i++){
    
    
    	g->node[i].oldrank = 0;
		g->node[i].pagerank = 1.0/N;    
//		printf("%lf\n", g->node[i].pagerank);	
	}
	double temp_delta, min_delta = INF;
	for(int i = 0; i < N; i++){
    
    
		temp_delta = g->node[i].pagerank - g->node[i].oldrank > 0 ? g->node[i].pagerank - g->node[i].oldrank : g->node[i].oldrank - g->node[i].pagerank;
		if(temp_delta < min_delta) min_delta = temp_delta;
	}
	while(temp_delta > delta){
    
    
//		printf("%lf\n", temp_delta);
		for(int j = 0; j < N; j++){
    
    
			g->node[j].oldrank = g->node[j].pagerank;
//			printf("%lf ", g->node[j].pagerank);
		}
//		putchar('\n');
		sink_rank = 0;
		for(int j = 0; j < N; j++){
    
    
			if(g->node[j].first == NULL){
    
    
				sink_rank = sink_rank + (damping * (g->node[j].oldrank / (double)N));
			}
		}
		for(int j = 0; j < N; j++){
    
    
			g->node[j].pagerank = sink_rank + ((1 - damping) / (double)N);
			for(int k = 0; k < N; k++){
    
    
				ArcNode * temp = g->node[k].first;
				while(temp){
    
    
					if(temp->adjvex == j){
    
    
//						printf("%d\n", temp->adjvex);
						int num_outbound_edge = 1;
						ArcNode * temp_num = g->node[k].first;
						while(temp_num->next){
    
    
							num_outbound_edge++;
							temp_num = temp_num->next;
						}
//						printf("%d\n", num_outbound_edge);
						g->node[j].pagerank = g->node[j].pagerank + ((damping * g->node[k].oldrank) / (double)num_outbound_edge);
						break;
					}
					temp = temp->next;
				}
			}
		}
		min_delta = INF;
		for(int i = 0; i < N; i++){
    
    
			temp_delta = g->node[i].pagerank - g->node[i].oldrank > 0 ? g->node[i].pagerank - g->node[i].oldrank : g->node[i].oldrank - g->node[i].pagerank;
			if(temp_delta < min_delta) min_delta = temp_delta;
		}
	}		
		
    return;
}

Guess you like

Origin blog.csdn.net/z135733/article/details/130499905