Spark mllib 列统计

Spark MLlib提供了一种叫colStats()的统计方法,调用该方法会返回一个类型为MultivariateStatisticalSummary的实例。通过这个实例看,我们可以获得每一列的最大值,最小值,均值、方差、总数等。

1	2	3	4	5
6	7	1	5	9
3	5	6	3	1
3	1	1	5	6
val data_path = "file:///Users/walle/Documents/D3/sparkmlib/sample_stat.txt"
    val data = sc.textFile(data_path).map(_.split("\t")).map(f => f.map(f => f.toDouble))
    val data1 = data.map(f => Vectors.dense(f))
    val stat1 = Statistics.colStats(data1)
    stat1.max
    stat1.min
    stat1.mean
    stat1.variance
    stat1.normL1
    stat1.normL2

http://www.waitingfy.com/archives/4632

猜你喜欢

转载自blog.csdn.net/fox64194167/article/details/81055334