join用于内连接。
后三个函数用于类似于SQL的左、右、全连接。
针对key-value形式的RDD。
1 2 3 4 5 |
val pairRDD1 = sc.parallelize(List( ("cat",2), ("cat", 5), ("book", 4),("cat", 12)))
val pairRDD2 = sc.parallelize(List( ("cat",2), ("cup", 5), ("mouse", 4),("cat", 12)))
pairRDD1.leftOuterJoin(pairRDD2).collect
pairRDD1.rightOuterJoin(pairRDD2).collect
pairRDD1.fullOuterJoin(pairRDD2).collect
|
pairRDD1:
("cat",2),
("cat", 5),
("book", 4),
("cat", 12)
pairRDD2:
("cat",2),
("cup", 5),
("mouse", 4),
("cat", 12)
leftOuterJoin结果:
(cat,(2,Some(2))),
(cat,(2,Some(12))),
(cat,(5,Some(2))),
(cat,(5,Some(12))),
(cat,(12,Some(2))),
(cat,(12,Some(12))),
(book,(4,None))
rightOuterJoin结果:
(cup,(None,5)),
(cat,(Some(2),2)),
(cat,(Some(2),12)),
(cat,(Some(5),2)),
(cat,(Some(5),12)),
(cat,(Some(12),2)),
(cat,(Some(12),12)),
(mouse,(None,4))
fullOuterJoin结果:
(cup,(None,Some(5))),
(cat,(Some(2),Some(2))),
(cat,(Some(2),Some(12))),
(cat,(Some(5),Some(2))),
(cat,(Some(5),Some(12))),
(cat,(Some(12),Some(2))),
(cat,(Some(12),Some(12))),
(book,(Some(4),None)),
(mouse,(None,Some(4)))