There are some commonly used graph algorithms in the org.apache.spark.graphx.lib package of Spark Graphx. One of them is Connected Components. This article will introduce the use of this algorithm. The following is the spark 1.6.3 source code for this algorithm Notes:
Compute the connected component membership of each vertex and return a graph with the vertex value containing the lowest vertex id in the connected component containing that vertex.
Demo
First prepare the data source
links.csv
1,2,friend
1,3,sister
2,4,brother
3,2,boss
4,5,client
1,9,friend
6,7,cousin
7,9,coworker
8,9,father
10,11,colleague
10,12,colleague
11,12,colleague
people.csv
4,Dave,25
6,Faith,21
8,Harvey,47
2,Bob,18
1,Alice,20
3,Charlie,30
7,George,34
9,Ivy,21
5,Eve,30
10,Lily,35
11,Helen,35
12,Ann,35
Graph structure
Example
After the result of connectedComponents, we can know which vertices are in a connected graph, so that a large graph can be split into several connected subgraphs.
import org.apache.spark.graphx._
val peopleRDD=people.map(x=>x.split(",")).map(x=>(x(0).toLong,Person(x(1),x(2).toInt)))
val linkRDD=links.map(x=>{
val row=x.split(",");Edge(row(0).toLong,row(1).toLong,row(2))})
val graph=Graph(peopleRDD,linkRDD)
val cc=graph.connectedComponents
val newGraph = cc.outerJoinVertices(peopleRDD)((id,mincc,people)=>(mincc,people.get.name,people.get.age))
cc.vertices.map(_._2).collect.distinct.foreach(id=>{
val sub=newGraph.subgraph(vpred=(id1,prop)=>prop._1==id)
sub.triplets.collect.foreach(println)
println()})
The result is two subgraphs, the output is:
((1,(1,Alice,20)),(2,(1,Bob,18)),friend)
((1,(1,Alice,20)),(3,(1,Charlie,30)),sister)
((1,(1,Alice,20)),(9,(1,Ivy,21)),friend)
((2,(1,Bob,18)),(4,(1,Dave,25)),brother)
((3,(1,Charlie,30)),(2,(1,Bob,18)),boss)
((4,(1,Dave,25)),(5,(1,Eve,30)),client)
((6,(1,Faith,21)),(7,(1,George,34)),cousin)
((7,(1,George,34)),(9,(1,Ivy,21)),coworker)
((8,(1,Harvey,47)),(9,(1,Ivy,21)),father)
((10,(10,Lily,35)),(11,(10,Helen,35)),colleague)
((10,(10,Lily,35)),(12,(10,Ann,35)),colleague)
((11,(10,Helen,35)),(12,(10,Ann,35)),colleague)