Email of the author of the article: [email protected] Address: Huizhou, Guangdong
▲ This chapter’s program
⚪ Master the case of Spark - predicting product demand;
⚪ Master the case of Spark - predict the murder rate;
1. Case 1 - Forecasting commodity demand
1. Description
The observed values of demand (y, tons), price (x1, yuan/kg) and consumer income (x2, yuan) of a certain commodity are shown in the table below.
y= β 1 X1+ β 2 X2+ β 0
y |
x1 |
x2 |
100 |
5 |
1000 |
75 |
7 |
600 |
80 |
6 |
1200 |
70 |
6 |
500 |
50 |
8 |
30 |
65 |
7 |
400 |
90 |
5 |
1300 |
100 |
4 |
1100 |
110 |
3 |
1300 |
60 |
9 |
300 |
In order to be able to model through Mllib, we first need to perform certain processing on the data format, such as as follows:
100|5 1000
75|7 600
80|6 1200
70|6 500
50|8 30
65|7 400
90|5 1300
100|4 1100
110|3 1300
60|9 300
X1=10 X2=400 Y=?
2. Code example:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.ml.regression.LinearRegressionModel.LinearRegressionModelReader
import org.apache.spark.ml.regression.LinearRegressionSummary
import org.apache.spark.ml.regression.LinearRegressionTrainingSummary
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LinearRegressionModel
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.sql.SQLContext
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.regression.LinearRegression
object Driver {
def main(args: Array[String]): Unit = {
val conf=new SparkConf().setMaster("local").setAppName("lr")
val sc=new SparkContext(conf)
val sqc=new SQLContext(sc)
val data=sc.textFile("d://ml/lritem.txt")
//--Convert the data into tuple format for later conversion into SparkSql DataFrame
val parseData=data.map { x =>
val parts=x.split("\\|")
val features=parts(1).split(" ")
(parts(0).toDouble,features(0).toDouble,features(1).toDouble)
}
//-- Convert to DF
val df=sqc.createDataFrame(parseData)
//-- Define the field names of each column
val dfData=df.toDF("Y","X1","X2")
//--Define feature