GraphX之Connected Components算法

在Spark Graphx的org.apache.spark.graphx.lib包中有一些常用的图算法,其中一个就是Connected Components,本文将会介绍此算法的使用方法,下面是spark 1.6.3源码中对这个算法的注释:

Compute the connected component membership of each vertex and return a graph with the vertex value containing the lowest vertex id in the connected component containing that vertex.

Demo

首先准备数据源

links.csv

1,2,friend
1,3,sister
2,4,brother
3,2,boss
4,5,client
1,9,friend
6,7,cousin
7,9,coworker
8,9,father
10,11,colleague
10,12,colleague
11,12,colleague

people.csv

4,Dave,25
6,Faith,21
8,Harvey,47
2,Bob,18
1,Alice,20
3,Charlie,30
7,George,34
9,Ivy,21
5,Eve,30
10,Lily,35
11,Helen,35
12,Ann,35

图结构

在这里插入图片描述
样例
经过connectedComponents得到的结果,可以知道哪些顶点在一个连通图中,这样就可以将一个大图拆分成若干个连通子图。

	import org.apache.spark.graphx._

	val peopleRDD=people.map(x=>x.split(",")).map(x=>(x(0).toLong,Person(x(1),x(2).toInt)))
	val linkRDD=links.map(x=>{
    
    val row=x.split(",");Edge(row(0).toLong,row(1).toLong,row(2))})
	val graph=Graph(peopleRDD,linkRDD)
    
	val cc=graph.connectedComponents
	val newGraph = cc.outerJoinVertices(peopleRDD)((id,mincc,people)=>(mincc,people.get.name,people.get.age))

cc.vertices.map(_._2).collect.distinct.foreach(id=>{
    
    
val sub=newGraph.subgraph(vpred=(id1,prop)=>prop._1==id)
sub.triplets.collect.foreach(println)
println()})

结果得到了两个子图,输出为:

((1,(1,Alice,20)),(2,(1,Bob,18)),friend)
((1,(1,Alice,20)),(3,(1,Charlie,30)),sister)
((1,(1,Alice,20)),(9,(1,Ivy,21)),friend)
((2,(1,Bob,18)),(4,(1,Dave,25)),brother)
((3,(1,Charlie,30)),(2,(1,Bob,18)),boss)
((4,(1,Dave,25)),(5,(1,Eve,30)),client)
((6,(1,Faith,21)),(7,(1,George,34)),cousin)
((7,(1,George,34)),(9,(1,Ivy,21)),coworker)
((8,(1,Harvey,47)),(9,(1,Ivy,21)),father)

((10,(10,Lily,35)),(11,(10,Helen,35)),colleague)
((10,(10,Lily,35)),(12,(10,Ann,35)),colleague)
((11,(10,Helen,35)),(12,(10,Ann,35)),colleague)

猜你喜欢

转载自blog.csdn.net/qq_42578036/article/details/110200058