版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/Lockey23/article/details/81876483
While we follow spark example
case class model_instance (features: Vector)
//and
val df = rawData.map(line =>
| { model_instance( Vectors.dense(line.split(",").filter(p => p.matches("\\d*(\\.?)\\d*"))
| .map(_.toDouble)) )}).toDF()
Errors may occur like below:
Column features must be of type org.apache.spark.ml.linalg.VectorUDT
Error with RDD[Vector] in function parameter
type Vector takes type parameters
for the error we should realize that the ml and mllib are in different version of Spark Machine Learning, you may not mix them together, just use this:
import org.apache.spark.ml.linalg.{Vector, Vectors}
//or
import org.apache.spark.mllib.linalg.{Vector, Vectors}
//remember not mix above import together, just use one case