Machine learning-[4] regression model theory plus practice

Theoretical part: 

Examples:

# -*- coding:utf-8 -*-
"""
@Project: py36Files 
@Author: 王瑞
@File: 糖病.py
@IDE: PyCharm
@Time: 2021-03-24 15:18:06
"""
"""
使用Sklearn库操作机器学习
numpy==1.16.0 线性代数
pandas==0.25.0 数据分析
matplotlib==3.0.1 数据可视化
sklearn==0.22.1 机器学习

1,数据分析处理:
在线性回归中,我们使用diabetes数据集,它是scikit-learn自带的一个糖尿病人采样并整理后所得,特点如下:
@每个数据集有442个样本
@每个样本都有10个特征
@每个特征都是浮点数,数据都在-0.2~0.2之间
@样本的目标在整数25~346之间
"""
# 步骤1 安装并引入必要的库
import numpy as np
from sklearn import datasets, linear_model, model_selection


# 加载数据,从sklearn的datasets导入数据集并对数据集命名
diabetes = datasets.load_diabetes()

# 看diabetes数据集的类型和数据。 该类型应该是'Bunch',它是一个类似于字典的对象,特别适用于加载sklearn内部示例数据集
print(type(diabetes))
print(diabetes)
print(diabetes.DESCR)

# 看这个数据集的样本特征集和样本标签
print('******data******')
print(diabetes.data)
print('******target******')
print(diabetes.target)

# 看这个数据集的样本特征集和样本标签的形状。 注意,样本特征集的形状是一个元组
print (diabetes.data.shape)
print (diabetes.target.shape)

"""
2,拆分数据:
利用model_selection模块中的train_test_split函数,用于将矩阵随机划分为训练子集和测试子集,并返回划分好的训练集样本,测试集样本,训练集标签,
测试集标签,并分别定义为:X_train,X_test,y_train,y_test
参数解释:
diabetes.data,被划分的diabetes样本特征集
diabetes.target,被划分的样本标签
test_size,如果是浮点数,在0-1之间,表示样本占比;如果是整数的话就是样本的数量
random_state,是随机数的种子
"""
X_train, X_test, y_train, y_test = \
    model_selection.train_test_split(diabetes.data, diabetes.target, test_size=0.25, random_state=0)

# 让我们来看看训练集样本X_train、测试集样本X_test、训练集标签y_train、测试集标签y_test
print('######X_train######')
print(X_train)
print('######X_test######')
print(X_test)
print('######y_train######')
print(y_train)
print('######y_test######')
print(y_test)

# 让我们来看看训练集样本X_train、测试集样本X_test、训练集标签y_train、测试集标签y_test的形状
print('######X_train######')
print(X_train.shape)
print('######X_test######')
print(X_test.shape)
print('######y_train######')
print(y_train.shape)
print('######y_test######')
print(y_test.shape)

"""
拟合预测

属性:
coet_:权重向量
intercept:b值

方法:
fit(X,y[,sample_weight]):训练模型
predict(X):用模型进行预测,返回预测值
score(X,y[,sample_weight]):返回预测性能得分。设预测集为Ttest,真实值为yi,真实值的均值为一串公式,不写了
score不超过1,但是可能为负值(预测效果太差的情况)
score越大,预测性能越好
"""
# 现在创建一个名为regr的LinearRegression实例
regr = linear_model.LinearRegression()

# 利用regr模型训练X_train和y_train
regr.fit(X_train, y_train)
print('Coefficients:%s, intercept %s' % (regr.coef_, regr.intercept_))
print("Residual sum of squares: %.2f" % np.mean((regr.predict(X_test) - y_test) ** 2))
print('Score: %.2f' % regr.score(X_test, y_test))
regr_R = linear_model.Ridge()
regr_R.fit(X_train, y_train)

# 回归评价
print('Coefficients:%s, intercept %s' % (regr_R.coef_, regr_R.intercept_))
print("Residual sum of squares: %.2f" % np.mean((regr_R.predict(X_test) - y_test) ** 2))
print('Score: %.2f' % regr_R.score(X_test, y_test))

 operation result:

D:\Programs\Python\Python36\python.exe D:/py36Files/linear_model/糖病.py
<class 'sklearn.utils.Bunch'>
{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990842, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06832974, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286377, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04687948,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452837, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00421986,  0.00306441]]), 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
       128.,  52.,  37., 170., 170.,  61., 144.,  52., 128.,  71., 163.,
       150.,  97., 160., 178.,  48., 270., 202., 111.,  85.,  42., 170.,
       200., 252., 113., 143.,  51.,  52., 210.,  65., 141.,  55., 134.,
        42., 111.,  98., 164.,  48.,  96.,  90., 162., 150., 279.,  92.,
        83., 128., 102., 302., 198.,  95.,  53., 134., 144., 232.,  81.,
       104.,  59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,
       173., 180.,  84., 121., 161.,  99., 109., 115., 268., 274., 158.,
       107.,  83., 103., 272.,  85., 280., 336., 281., 118., 317., 235.,
        60., 174., 259., 178., 128.,  96., 126., 288.,  88., 292.,  71.,
       197., 186.,  25.,  84.,  96., 195.,  53., 217., 172., 131., 214.,
        59.,  70., 220., 268., 152.,  47.,  74., 295., 101., 151., 127.,
       237., 225.,  81., 151., 107.,  64., 138., 185., 265., 101., 137.,
       143., 141.,  79., 292., 178.,  91., 116.,  86., 122.,  72., 129.,
       142.,  90., 158.,  39., 196., 222., 277.,  99., 196., 202., 155.,
        77., 191.,  70.,  73.,  49.,  65., 263., 248., 296., 214., 185.,
        78.,  93., 252., 150.,  77., 208.,  77., 108., 160.,  53., 220.,
       154., 259.,  90., 246., 124.,  67.,  72., 257., 262., 275., 177.,
        71.,  47., 187., 125.,  78.,  51., 258., 215., 303., 243.,  91.,
       150., 310., 153., 346.,  63.,  89.,  50.,  39., 103., 308., 116.,
       145.,  74.,  45., 115., 264.,  87., 202., 127., 182., 241.,  66.,
        94., 283.,  64., 102., 200., 265.,  94., 230., 181., 156., 233.,
        60., 219.,  80.,  68., 332., 248.,  84., 200.,  55.,  85.,  89.,
        31., 129.,  83., 275.,  65., 198., 236., 253., 124.,  44., 172.,
       114., 142., 109., 180., 144., 163., 147.,  97., 220., 190., 109.,
       191., 122., 230., 242., 248., 249., 192., 131., 237.,  78., 135.,
       244., 199., 270., 164.,  72.,  96., 306.,  91., 214.,  95., 216.,
       263., 178., 113., 200., 139., 139.,  88., 148.,  88., 243.,  71.,
        77., 109., 272.,  60.,  54., 221.,  90., 311., 281., 182., 321.,
        58., 262., 206., 233., 242., 123., 167.,  63., 197.,  71., 168.,
       140., 217., 121., 235., 245.,  40.,  52., 104., 132.,  88.,  69.,
       219.,  72., 201., 110.,  51., 277.,  63., 118.,  69., 273., 258.,
        43., 198., 242., 232., 175.,  93., 168., 275., 293., 281.,  72.,
       140., 189., 181., 209., 136., 261., 113., 131., 174., 257.,  55.,
        84.,  42., 146., 212., 233.,  91., 111., 152., 120.,  67., 310.,
        94., 183.,  66., 173.,  72.,  49.,  64.,  48., 178., 104., 132.,
       220.,  57.]), 'DESCR': '.. _diabetes_dataset:\n\nDiabetes dataset\n----------------\n\nTen baseline variables, age, sex, body mass index, average blood\npressure, and six blood serum measurements were obtained for each of n =\n442 diabetes patients, as well as the response of interest, a\nquantitative measure of disease progression one year after baseline.\n\n**Data Set Characteristics:**\n\n  :Number of Instances: 442\n\n  :Number of Attributes: First 10 columns are numeric predictive values\n\n  :Target: Column 11 is a quantitative measure of disease progression one year after baseline\n\n  :Attribute Information:\n      - Age\n      - Sex\n      - Body mass index\n      - Average blood pressure\n      - S1\n      - S2\n      - S3\n      - S4\n      - S5\n      - S6\n\nNote: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).\n\nSource URL:\nhttps://www4.stat.ncsu.edu/~boos/var.select/diabetes.html\n\nFor more information see:\nBradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.\n(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)', 'feature_names': ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'], 'data_filename': 'D:\\Programs\\Python\\Python36\\lib\\site-packages\\sklearn\\datasets\\data\\diabetes_data.csv.gz', 'target_filename': 'D:\\Programs\\Python\\Python36\\lib\\site-packages\\sklearn\\datasets\\data\\diabetes_target.csv.gz'}
.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - Age
      - Sex
      - Body mass index
      - Average blood pressure
      - S1
      - S2
      - S3
      - S4
      - S5
      - S6

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).

Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
******data******
[[ 0.03807591  0.05068012  0.06169621 ... -0.00259226  0.01990842
  -0.01764613]
 [-0.00188202 -0.04464164 -0.05147406 ... -0.03949338 -0.06832974
  -0.09220405]
 [ 0.08529891  0.05068012  0.04445121 ... -0.00259226  0.00286377
  -0.02593034]
 ...
 [ 0.04170844  0.05068012 -0.01590626 ... -0.01107952 -0.04687948
   0.01549073]
 [-0.04547248 -0.04464164  0.03906215 ...  0.02655962  0.04452837
  -0.02593034]
 [-0.04547248 -0.04464164 -0.0730303  ... -0.03949338 -0.00421986
   0.00306441]]
******target******
[151.  75. 141. 206. 135.  97. 138.  63. 110. 310. 101.  69. 179. 185.
 118. 171. 166. 144.  97. 168.  68.  49.  68. 245. 184. 202. 137.  85.
 131. 283. 129.  59. 341.  87.  65. 102. 265. 276. 252.  90. 100.  55.
  61.  92. 259.  53. 190. 142.  75. 142. 155. 225.  59. 104. 182. 128.
  52.  37. 170. 170.  61. 144.  52. 128.  71. 163. 150.  97. 160. 178.
  48. 270. 202. 111.  85.  42. 170. 200. 252. 113. 143.  51.  52. 210.
  65. 141.  55. 134.  42. 111.  98. 164.  48.  96.  90. 162. 150. 279.
  92.  83. 128. 102. 302. 198.  95.  53. 134. 144. 232.  81. 104.  59.
 246. 297. 258. 229. 275. 281. 179. 200. 200. 173. 180.  84. 121. 161.
  99. 109. 115. 268. 274. 158. 107.  83. 103. 272.  85. 280. 336. 281.
 118. 317. 235.  60. 174. 259. 178. 128.  96. 126. 288.  88. 292.  71.
 197. 186.  25.  84.  96. 195.  53. 217. 172. 131. 214.  59.  70. 220.
 268. 152.  47.  74. 295. 101. 151. 127. 237. 225.  81. 151. 107.  64.
 138. 185. 265. 101. 137. 143. 141.  79. 292. 178.  91. 116.  86. 122.
  72. 129. 142.  90. 158.  39. 196. 222. 277.  99. 196. 202. 155.  77.
 191.  70.  73.  49.  65. 263. 248. 296. 214. 185.  78.  93. 252. 150.
  77. 208.  77. 108. 160.  53. 220. 154. 259.  90. 246. 124.  67.  72.
 257. 262. 275. 177.  71.  47. 187. 125.  78.  51. 258. 215. 303. 243.
  91. 150. 310. 153. 346.  63.  89.  50.  39. 103. 308. 116. 145.  74.
  45. 115. 264.  87. 202. 127. 182. 241.  66.  94. 283.  64. 102. 200.
 265.  94. 230. 181. 156. 233.  60. 219.  80.  68. 332. 248.  84. 200.
  55.  85.  89.  31. 129.  83. 275.  65. 198. 236. 253. 124.  44. 172.
 114. 142. 109. 180. 144. 163. 147.  97. 220. 190. 109. 191. 122. 230.
 242. 248. 249. 192. 131. 237.  78. 135. 244. 199. 270. 164.  72.  96.
 306.  91. 214.  95. 216. 263. 178. 113. 200. 139. 139.  88. 148.  88.
 243.  71.  77. 109. 272.  60.  54. 221.  90. 311. 281. 182. 321.  58.
 262. 206. 233. 242. 123. 167.  63. 197.  71. 168. 140. 217. 121. 235.
 245.  40.  52. 104. 132.  88.  69. 219.  72. 201. 110.  51. 277.  63.
 118.  69. 273. 258.  43. 198. 242. 232. 175.  93. 168. 275. 293. 281.
  72. 140. 189. 181. 209. 136. 261. 113. 131. 174. 257.  55.  84.  42.
 146. 212. 233.  91. 111. 152. 120.  67. 310.  94. 183.  66. 173.  72.
  49.  64.  48. 178. 104. 132. 220.  57.]
(442, 10)
(442,)
######X_train######
[[-0.04910502 -0.04464164 -0.05686312 ... -0.03949338 -0.01190068
   0.01549073]
 [-0.05273755 -0.04464164 -0.05578531 ...  0.03430886  0.13237265
   0.00306441]
 [-0.09269548  0.05068012 -0.0902753  ... -0.00259226  0.02405258
   0.00306441]
 ...
 [ 0.05987114 -0.04464164 -0.02129532 ...  0.07120998  0.07912108
   0.13561183]
 [-0.07816532 -0.04464164 -0.0730303  ... -0.03949338 -0.01811827
  -0.08391984]
 [ 0.04170844  0.05068012  0.07139652 ...  0.03430886  0.07341008
   0.08590655]]
######X_test######
[[ 0.01991321  0.05068012  0.10480869 ... -0.00259226  0.00371174
   0.04034337]
 [-0.01277963 -0.04464164  0.06061839 ...  0.03430886  0.0702113
   0.00720652]
 [ 0.03807591  0.05068012  0.00888341 ... -0.00259226 -0.01811827
   0.00720652]
 ...
 [-0.0854304  -0.04464164 -0.00405033 ... -0.03949338 -0.0611766
  -0.01350402]
 [ 0.03807591  0.05068012 -0.02991782 ... -0.00259226 -0.01290794
   0.00306441]
 [ 0.04170844  0.05068012  0.01966154 ... -0.00259226  0.03119299
   0.00720652]]
######y_train######
[ 68. 109.  94. 118. 275. 275. 127. 281.  71.  42.  71. 128. 272. 135.
  51. 220. 167.  78. 131. 212. 182. 174. 259.  77.  91. 310.  84. 134.
 102. 128. 306. 245. 201. 183. 111.  96. 125. 182. 177.  48.  97. 259.
 288. 242.  69.  31. 154. 150.  52. 261. 118. 102. 139.  51.  58. 144.
 178.  97.  78. 129. 258. 124. 198. 185.  66. 237. 178. 275. 268. 242.
 200. 214. 246. 236.  85. 114.  93.  99.  72. 270. 111.  83.  87.  42.
 172.  65. 259. 279. 141. 144. 220.  90. 101.  53.  67.  72. 121. 303.
 232. 140. 190. 221.  71. 116. 111. 280. 233.  78. 150. 283.  64. 140.
  65. 225. 206.  63. 296. 173.  85. 141.  50.  25. 153.  55. 139. 336.
  73.  95. 109.  44. 180. 263. 148.  79.  65. 102. 220. 277. 246. 200.
 262. 191.  97. 184.  85. 248. 150. 268.  59.  70.  88. 100. 190. 113.
  66. 243. 185. 262.  48. 160. 217. 210. 132. 257. 104. 126. 292. 166.
  83.  81. 144. 281.  72.  39. 109.  60. 258. 178. 168.  87.  77. 216.
 206. 142. 161. 265.  60. 200. 265. 272. 146.  94.  55.  69. 138. 258.
 143. 172.  89.  69. 199.  55.  45. 265.  91. 170.  55. 202. 155.  77.
  77.  71. 123.  84. 252.  52.  40. 274. 143. 245.  92. 151.  39. 235.
  92. 253.  94.  81. 346.  90. 181. 162. 277. 152. 178. 124.  75. 263.
 202. 200. 108.  96.  60.  72. 107.  54. 158. 152. 220. 308. 249. 222.
  65. 173.  88.  72. 164.  52. 115. 200.  90. 248.  37. 230.  63. 273.
  61.  53. 189. 241. 118. 252. 104. 219. 115. 332. 131. 185.  63. 131.
  88. 187. 196.  59. 341. 109. 101. 113.  80. 242. 168. 128. 233. 209.
 225.  83. 214.  96. 129.  47. 229. 293.  74. 202. 164. 202.  59.  91.
 120. 151. 310.  90. 116. 147.  43.  42.  48. 134.  84.  71.  64.  70.
 310. 311. 122. 243. 248.  91. 281. 142. 295.]
######y_test######
[321. 215. 127.  64. 175. 275. 179. 232. 142.  99. 252. 174. 129.  74.
 264.  49.  86.  75. 101. 155. 170. 276. 110. 136.  68. 128. 103.  93.
 191. 196. 217. 181. 168. 200. 219. 281. 151. 257.  49. 198.  96. 179.
  95. 198. 244.  89. 214. 182.  84. 270. 156. 138. 113. 131. 195. 171.
 122.  61. 230. 235.  52. 121. 144. 107. 132. 302.  53. 317. 137.  57.
  98. 170.  88.  90.  67. 163. 104. 186. 180. 283. 141. 150.  47. 297.
 104.  49. 103. 142.  59.  85. 137.  53.  51. 197. 135.  72. 208. 237.
 145. 110. 292.  97. 197. 158. 163.  63. 192. 233.  68. 160. 178.]
######X_train######
(331, 10)
######X_test######
(111, 10)
######y_train######
(331,)
######y_test######
(111,)
Coefficients:[ -43.26774487 -208.67053951  593.39797213  302.89814903 -560.27689824
  261.47657106   -8.83343952  135.93715156  703.22658427   28.34844354], intercept 153.06798218266258
Residual sum of squares: 3180.20
Score: 0.36
Coefficients:[  21.19927911  -60.47711393  302.87575204  179.41206395    8.90911449
  -28.8080548  -149.30722541  112.67185758  250.53760873   99.57749017], intercept 152.4477761489962
Residual sum of squares: 3192.33
Score: 0.36

Process finished with exit code 0

 

Guess you like

Origin blog.csdn.net/qq_46009608/article/details/115251056