Spark Review 11: Built-in graph algorithm, PageRank algorithm analysis and simple case

1. PageRank algorithm description:

1.1 Initialize each vertex with a page ranking value of 1 / N, where N is the total number of vertices in the graph.

1.2 Loop:
              Each vertex sends a PR value of 1 / M along the outgoing edge, where M is the outgoing degree of the current vertex.

             When each vertex receives other sent PR values ​​from neighboring vertices, the total of these PR values ​​is taken as the new PR value of the current vertex.

             The PR of the vertex in the graph does not change significantly compared to the previous iteration, and the iteration is exited.

2. PageRank algorithm case:

package sparkGraphX

import org.apache.spark.graphx.{Graph, GraphLoader, VertexRDD}
import org.apache.spark.{SparkConf, SparkContext}

object pageRankTest {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("SimpleGraphX").setMaster("local")
    val sc = new SparkContext(conf)
    sc.setLogLevel("WARN")
    val graph: Graph[Int, Int] = GraphLoader.edgeListFile(sc,"D:/web.txt")
//  val web: VertexRDD[Double] = graph.pageRank(0.001).vertices   //动态调用 参数为收敛值
    val staticPage: VertexRDD[Double] = graph.staticPageRank(5).vertices  //静态调用 参数为迭代次数,第二个参数为resetProb: Double = 0.15为方法写死的值,不需要写即可。
    staticPage.collect.foreach(println(_)) //输出计算的结果

  }
}

 Printed results: Graphic representation of vertices:

                    

Result analysis:

2 has the most in-degrees, so the ranking value is the highest, and there is only one out-degree for vertex 2, which is 8, so the value of 8 is relatively high. The out-degree of 8 is 1, and 1 has two out-degrees, spreading the ranking values. And so on. There is also the influence of ranking value coefficients. This is just a preliminary application.

Published 29 original articles · praised 55 · visits 1491

Guess you like

Origin blog.csdn.net/csdnliu123/article/details/105636761