Big data course K14 - Spark data mining case

Email of the author of the article: [email protected] Address: Huizhou, Guangdong

 ▲ This chapter’s program

⚪ Master the case of Spark - predicting product demand;

⚪ Master the case of Spark - predict the murder rate;

1. Case 1 - Forecasting commodity demand

1. Description

The observed values ​​of demand (y, tons), price (x1, yuan/kg) and consumer income (x2, yuan) of a certain commodity are shown in the table below.

y= β 1 X1+ β 2 X2+ β

y

x1

x2  

100

5

1000

75

7

600

80

6

1200

70

6

500

50

8

30

65

7

400

90

5

1300

100

4

1100

110

3

1300

60

9

300

In order to be able to model through Mllib, we first need to perform certain processing on the data format, such as as follows:

100|5  1000

75|7  600

80|6  1200

70|6  500

50|8  30

65|7  400

90|5  1300

100|4  1100

110|3  1300

60|9  300

X1=10 X2=400 Y=?

2. Code example:

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.mllib.regression.LabeledPoint

import org.apache.spark.mllib.linalg.Vectors

import org.apache.spark.ml.regression.LinearRegressionModel.LinearRegressionModelReader

import org.apache.spark.ml.regression.LinearRegressionSummary

import org.apache.spark.ml.regression.LinearRegressionTrainingSummary

import org.apache.spark.mllib.regression.LinearRegressionWithSGD

import org.apache.spark.mllib.regression.LinearRegressionModel

import org.apache.spark.ml.regression.LinearRegressionModel

import org.apache.spark.sql.SQLContext

import org.apache.spark.ml.feature.VectorAssembler

import org.apache.spark.ml.regression.LinearRegression

object Driver { 

  def main(args: Array[String]): Unit = {   

     val conf=new SparkConf().setMaster("local").setAppName("lr")    

     val sc=new SparkContext(conf)    

     val sqc=new SQLContext(sc)    

     val data=sc.textFile("d://ml/lritem.txt")    

     //--Convert the data into tuple format for later conversion into SparkSql DataFrame

     val parseData=data.map { x =>

       val parts=x.split("\\|")

       val features=parts(1).split(" ")

       (parts(0).toDouble,features(0).toDouble,features(1).toDouble)

     }    

     //-- Convert to DF

     val df=sqc.createDataFrame(parseData)    

     //-- Define the field names of each column

     val dfData=df.toDF("Y","X1","X2")

     //--Define feature

Guess you like

Origin blog.csdn.net/u013955758/article/details/132438494