How to join two RDDs with different lengh in Spark?

Hamid Roghani :

i have 2 RDDs. the first RDD is the original RDD and the second one is an RDD that i have filtered out from the original one and have done some processes on it. after performing processes i want to join them. the original RDD looks likes below:

(1,5)
(2,60)
(3,7)
(4,1)
(5,1)
...
(10,8)

and the filtered and manipulated RDD is :

(4,3)
(5,10)
(6,6)
(7,9)

how should i join them?? when i use fullouterjoin or other join methods it gives error

Edited

i wrote code as you said like this:

        original_RDD=original_RDD.fullOuterJoin(new_RDD).foreach { case (joinKey, (oldOption, newOption)) =>
        newOption match {
          case None => (joinKey,oldOption)
          case Some(newOption) => (joinKey,newOption)
        }
      }

but i get this error:

Error:(232, 55) type mismatch;
 found   : Unit
 required: org.apache.spark.rdd.RDD[(Long, Int)]
        nodes=nodes.fullOuterJoin(joined_new).foreach { case (joinKey, (oldOption, newOption)) =>
User9123 :

See join syntax

When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.

originalRdd
  .fullOuterJoin(joinRdd)
  .foreach { case (joinKey, (oldOption, newOption)) =>
    newOption match {
      case None => println("new value is None")
      case Some(joinValue) => println(s"new value = $joinValue")
    }
  }

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=33286&siteId=1