Connected Components Algorithm of GraphX

There are some commonly used graph algorithms in the org.apache.spark.graphx.lib package of Spark Graphx. One of them is Connected Components. This article will introduce the use of this algorithm. The following is the spark 1.6.3 source code for this algorithm Notes:

Compute the connected component membership of each vertex and return a graph with the vertex value containing the lowest vertex id in the connected component containing that vertex.

Demo

First prepare the data source

links.csv

1,2,friend
1,3,sister
2,4,brother
3,2,boss
4,5,client
1,9,friend
6,7,cousin
7,9,coworker
8,9,father
10,11,colleague
10,12,colleague
11,12,colleague

people.csv

4,Dave,25
6,Faith,21
8,Harvey,47
2,Bob,18
1,Alice,20
3,Charlie,30
7,George,34
9,Ivy,21
5,Eve,30
10,Lily,35
11,Helen,35
12,Ann,35

Graph structure

Insert picture description here
Example
After the result of connectedComponents, we can know which vertices are in a connected graph, so that a large graph can be split into several connected subgraphs.

	import org.apache.spark.graphx._

	val peopleRDD=people.map(x=>x.split(",")).map(x=>(x(0).toLong,Person(x(1),x(2).toInt)))
	val linkRDD=links.map(x=>{
    
    val row=x.split(",");Edge(row(0).toLong,row(1).toLong,row(2))})
	val graph=Graph(peopleRDD,linkRDD)
    
	val cc=graph.connectedComponents
	val newGraph = cc.outerJoinVertices(peopleRDD)((id,mincc,people)=>(mincc,people.get.name,people.get.age))

cc.vertices.map(_._2).collect.distinct.foreach(id=>{
    
    
val sub=newGraph.subgraph(vpred=(id1,prop)=>prop._1==id)
sub.triplets.collect.foreach(println)
println()})

The result is two subgraphs, the output is:

((1,(1,Alice,20)),(2,(1,Bob,18)),friend)
((1,(1,Alice,20)),(3,(1,Charlie,30)),sister)
((1,(1,Alice,20)),(9,(1,Ivy,21)),friend)
((2,(1,Bob,18)),(4,(1,Dave,25)),brother)
((3,(1,Charlie,30)),(2,(1,Bob,18)),boss)
((4,(1,Dave,25)),(5,(1,Eve,30)),client)
((6,(1,Faith,21)),(7,(1,George,34)),cousin)
((7,(1,George,34)),(9,(1,Ivy,21)),coworker)
((8,(1,Harvey,47)),(9,(1,Ivy,21)),father)

((10,(10,Lily,35)),(11,(10,Helen,35)),colleague)
((10,(10,Lily,35)),(12,(10,Ann,35)),colleague)
((11,(10,Helen,35)),(12,(10,Ann,35)),colleague)

Guess you like

Origin blog.csdn.net/qq_42578036/article/details/110200058