Spark Mlib 数据类型总结

本地变量的基类是Vector,支持密集向量Dense Vector和稀疏向量Sparse Vector,scala实现如下:

val dv:Vector = Vector.dense(5.0,6.0,7.0)

val sv:Vector = Vector.sparse(3,Array(0,2),Array(1.0,3.0))

标点类型LabeledPoint,由一个标签和本地向量组成,标签可以是Int型或者Double型。scala实现如下:

val pos=LabledPoint(1.0,Vector.dense(1.0,0.0,3.0))

val pos=LabledPoint(0.0,Vector.sparse(3,Array(0,2),Array(1.0,3.0)))

稀疏数据,如LibSVM格式,label index1:value1 index2:value2 index3:value3

本地矩阵:基类是Matrix,Mlib提供了DenseMatrix实现。

猜你喜欢

转载自blog.csdn.net/peter_changyb/article/details/81181357