版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_41455420/article/details/89531385
Spark MLlib线性回归简单实现
Spark MLlib线性回归简单实现
一、训练数据
普通标签数据,数据格式:“标签,特征值1 特征值2 特征值3…”
训练数据lpsa.data如下:
-0.4307829,-1.63735562648104 -2.00621178480549 -1.86242597251066 -1.02470580167082 -0.52294088712441 -0.863171185425845 -1.04215728919298 -0.864466507337306
-0.1625189,-1.98898046126935 -0.722008756122123 -0.787896192088153 -1.02470580167082 -0.522940888712441 -0.863171185425945 -1.0421572891928 -0.864466507337306
二、实战代码
import org.apache.log4j.{Level, Logger}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.{LabeledPoint, LinearRegressionModel, LinearRegressionWithSGD}
object LinearRession {
def main(args: Array[String]): Unit = {
//1.构建spark对象
val conf: SparkConf = new SparkConf().setAppName("LinearRessionWithSGD").setMaster("local[2]")
val sc = new SparkContext(conf)
Logger.getRootLogger.setLevel(Level.WARN)
//2.读取样本数据
val data_path = "hdfs://node-1:9000/spark_data/lpsa.data"
val data: RDD[String] = sc.textFile(data_path)
val examples = data.map { line =>
val parts: Array[String] = line.split(",")
LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
}.cache()
val numExamples: Long = examples.count()
//3.新建线性回归模型、并设置训练参数
val numIterations = 100
val stepSize = 1
val miniBatchFraction = 1.0
val model: LinearRegressionModel = LinearRegressionWithSGD.train(examples, numIterations, stepSize, miniBatchFraction)
model.weights
model.intercept
//4.对样本进行测试
val prediction: RDD[Double] = model.predict(examples.map(_.features))
val predictionAndLabel: RDD[(Double, Double)] = prediction.zip(examples.map(_.label))
val print_predict: Array[(Double, Double)] = predictionAndLabel.take(50)
println("prediction" + "\t" + "label")
for (i <- 0 to print_predict.length - 1) {
println(print_predict(i)._1 + "\t" + print_predict(i)._2)
}
//5.计算测试误差
val loss: Double = predictionAndLabel.map {
x => (x._1 - 1) * (x._1 - 1)
}.reduce(_ + _)
val rmse: Double = math.sqrt(loss / numExamples)
println(s"Test RMSE =$rmse")
//6.保存模型
val mode_path = "D:\\idea\\SparkLinearRegressionTestT\\LinearRessionModel"
model.save(sc,mode_path)
//7.加载模型
LinearRegressionModel.load(sc,mode_path)
sc.stop()
}
}
三、线性回归预测及预测误差
喜欢就点赞评论+关注吧
感谢阅读,希望能帮助到大家,谢谢大家的支持!