Theoretical part:
Examples:
# -*- coding:utf-8 -*-
"""
@Project: py36Files
@Author: 王瑞
@File: 糖病.py
@IDE: PyCharm
@Time: 2021-03-24 15:18:06
"""
"""
使用Sklearn库操作机器学习
numpy==1.16.0 线性代数
pandas==0.25.0 数据分析
matplotlib==3.0.1 数据可视化
sklearn==0.22.1 机器学习
1,数据分析处理:
在线性回归中,我们使用diabetes数据集,它是scikit-learn自带的一个糖尿病人采样并整理后所得,特点如下:
@每个数据集有442个样本
@每个样本都有10个特征
@每个特征都是浮点数,数据都在-0.2~0.2之间
@样本的目标在整数25~346之间
"""
# 步骤1 安装并引入必要的库
import numpy as np
from sklearn import datasets, linear_model, model_selection
# 加载数据,从sklearn的datasets导入数据集并对数据集命名
diabetes = datasets.load_diabetes()
# 看diabetes数据集的类型和数据。 该类型应该是'Bunch',它是一个类似于字典的对象,特别适用于加载sklearn内部示例数据集
print(type(diabetes))
print(diabetes)
print(diabetes.DESCR)
# 看这个数据集的样本特征集和样本标签
print('******data******')
print(diabetes.data)
print('******target******')
print(diabetes.target)
# 看这个数据集的样本特征集和样本标签的形状。 注意,样本特征集的形状是一个元组
print (diabetes.data.shape)
print (diabetes.target.shape)
"""
2,拆分数据:
利用model_selection模块中的train_test_split函数,用于将矩阵随机划分为训练子集和测试子集,并返回划分好的训练集样本,测试集样本,训练集标签,
测试集标签,并分别定义为:X_train,X_test,y_train,y_test
参数解释:
diabetes.data,被划分的diabetes样本特征集
diabetes.target,被划分的样本标签
test_size,如果是浮点数,在0-1之间,表示样本占比;如果是整数的话就是样本的数量
random_state,是随机数的种子
"""
X_train, X_test, y_train, y_test = \
model_selection.train_test_split(diabetes.data, diabetes.target, test_size=0.25, random_state=0)
# 让我们来看看训练集样本X_train、测试集样本X_test、训练集标签y_train、测试集标签y_test
print('######X_train######')
print(X_train)
print('######X_test######')
print(X_test)
print('######y_train######')
print(y_train)
print('######y_test######')
print(y_test)
# 让我们来看看训练集样本X_train、测试集样本X_test、训练集标签y_train、测试集标签y_test的形状
print('######X_train######')
print(X_train.shape)
print('######X_test######')
print(X_test.shape)
print('######y_train######')
print(y_train.shape)
print('######y_test######')
print(y_test.shape)
"""
拟合预测
属性:
coet_:权重向量
intercept:b值
方法:
fit(X,y[,sample_weight]):训练模型
predict(X):用模型进行预测,返回预测值
score(X,y[,sample_weight]):返回预测性能得分。设预测集为Ttest,真实值为yi,真实值的均值为一串公式,不写了
score不超过1,但是可能为负值(预测效果太差的情况)
score越大,预测性能越好
"""
# 现在创建一个名为regr的LinearRegression实例
regr = linear_model.LinearRegression()
# 利用regr模型训练X_train和y_train
regr.fit(X_train, y_train)
print('Coefficients:%s, intercept %s' % (regr.coef_, regr.intercept_))
print("Residual sum of squares: %.2f" % np.mean((regr.predict(X_test) - y_test) ** 2))
print('Score: %.2f' % regr.score(X_test, y_test))
regr_R = linear_model.Ridge()
regr_R.fit(X_train, y_train)
# 回归评价
print('Coefficients:%s, intercept %s' % (regr_R.coef_, regr_R.intercept_))
print("Residual sum of squares: %.2f" % np.mean((regr_R.predict(X_test) - y_test) ** 2))
print('Score: %.2f' % regr_R.score(X_test, y_test))
operation result:
D:\Programs\Python\Python36\python.exe D:/py36Files/linear_model/糖病.py
<class 'sklearn.utils.Bunch'>
{'data': array([[ 0.03807591, 0.05068012, 0.06169621, ..., -0.00259226,
0.01990842, -0.01764613],
[-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
-0.06832974, -0.09220405],
[ 0.08529891, 0.05068012, 0.04445121, ..., -0.00259226,
0.00286377, -0.02593034],
...,
[ 0.04170844, 0.05068012, -0.01590626, ..., -0.01107952,
-0.04687948, 0.01549073],
[-0.04547248, -0.04464164, 0.03906215, ..., 0.02655962,
0.04452837, -0.02593034],
[-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
-0.00421986, 0.00306441]]), 'target': array([151., 75., 141., 206., 135., 97., 138., 63., 110., 310., 101.,
69., 179., 185., 118., 171., 166., 144., 97., 168., 68., 49.,
68., 245., 184., 202., 137., 85., 131., 283., 129., 59., 341.,
87., 65., 102., 265., 276., 252., 90., 100., 55., 61., 92.,
259., 53., 190., 142., 75., 142., 155., 225., 59., 104., 182.,
128., 52., 37., 170., 170., 61., 144., 52., 128., 71., 163.,
150., 97., 160., 178., 48., 270., 202., 111., 85., 42., 170.,
200., 252., 113., 143., 51., 52., 210., 65., 141., 55., 134.,
42., 111., 98., 164., 48., 96., 90., 162., 150., 279., 92.,
83., 128., 102., 302., 198., 95., 53., 134., 144., 232., 81.,
104., 59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,
173., 180., 84., 121., 161., 99., 109., 115., 268., 274., 158.,
107., 83., 103., 272., 85., 280., 336., 281., 118., 317., 235.,
60., 174., 259., 178., 128., 96., 126., 288., 88., 292., 71.,
197., 186., 25., 84., 96., 195., 53., 217., 172., 131., 214.,
59., 70., 220., 268., 152., 47., 74., 295., 101., 151., 127.,
237., 225., 81., 151., 107., 64., 138., 185., 265., 101., 137.,
143., 141., 79., 292., 178., 91., 116., 86., 122., 72., 129.,
142., 90., 158., 39., 196., 222., 277., 99., 196., 202., 155.,
77., 191., 70., 73., 49., 65., 263., 248., 296., 214., 185.,
78., 93., 252., 150., 77., 208., 77., 108., 160., 53., 220.,
154., 259., 90., 246., 124., 67., 72., 257., 262., 275., 177.,
71., 47., 187., 125., 78., 51., 258., 215., 303., 243., 91.,
150., 310., 153., 346., 63., 89., 50., 39., 103., 308., 116.,
145., 74., 45., 115., 264., 87., 202., 127., 182., 241., 66.,
94., 283., 64., 102., 200., 265., 94., 230., 181., 156., 233.,
60., 219., 80., 68., 332., 248., 84., 200., 55., 85., 89.,
31., 129., 83., 275., 65., 198., 236., 253., 124., 44., 172.,
114., 142., 109., 180., 144., 163., 147., 97., 220., 190., 109.,
191., 122., 230., 242., 248., 249., 192., 131., 237., 78., 135.,
244., 199., 270., 164., 72., 96., 306., 91., 214., 95., 216.,
263., 178., 113., 200., 139., 139., 88., 148., 88., 243., 71.,
77., 109., 272., 60., 54., 221., 90., 311., 281., 182., 321.,
58., 262., 206., 233., 242., 123., 167., 63., 197., 71., 168.,
140., 217., 121., 235., 245., 40., 52., 104., 132., 88., 69.,
219., 72., 201., 110., 51., 277., 63., 118., 69., 273., 258.,
43., 198., 242., 232., 175., 93., 168., 275., 293., 281., 72.,
140., 189., 181., 209., 136., 261., 113., 131., 174., 257., 55.,
84., 42., 146., 212., 233., 91., 111., 152., 120., 67., 310.,
94., 183., 66., 173., 72., 49., 64., 48., 178., 104., 132.,
220., 57.]), 'DESCR': '.. _diabetes_dataset:\n\nDiabetes dataset\n----------------\n\nTen baseline variables, age, sex, body mass index, average blood\npressure, and six blood serum measurements were obtained for each of n =\n442 diabetes patients, as well as the response of interest, a\nquantitative measure of disease progression one year after baseline.\n\n**Data Set Characteristics:**\n\n :Number of Instances: 442\n\n :Number of Attributes: First 10 columns are numeric predictive values\n\n :Target: Column 11 is a quantitative measure of disease progression one year after baseline\n\n :Attribute Information:\n - Age\n - Sex\n - Body mass index\n - Average blood pressure\n - S1\n - S2\n - S3\n - S4\n - S5\n - S6\n\nNote: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).\n\nSource URL:\nhttps://www4.stat.ncsu.edu/~boos/var.select/diabetes.html\n\nFor more information see:\nBradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.\n(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)', 'feature_names': ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'], 'data_filename': 'D:\\Programs\\Python\\Python36\\lib\\site-packages\\sklearn\\datasets\\data\\diabetes_data.csv.gz', 'target_filename': 'D:\\Programs\\Python\\Python36\\lib\\site-packages\\sklearn\\datasets\\data\\diabetes_target.csv.gz'}
.. _diabetes_dataset:
Diabetes dataset
----------------
Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.
**Data Set Characteristics:**
:Number of Instances: 442
:Number of Attributes: First 10 columns are numeric predictive values
:Target: Column 11 is a quantitative measure of disease progression one year after baseline
:Attribute Information:
- Age
- Sex
- Body mass index
- Average blood pressure
- S1
- S2
- S3
- S4
- S5
- S6
Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
******data******
[[ 0.03807591 0.05068012 0.06169621 ... -0.00259226 0.01990842
-0.01764613]
[-0.00188202 -0.04464164 -0.05147406 ... -0.03949338 -0.06832974
-0.09220405]
[ 0.08529891 0.05068012 0.04445121 ... -0.00259226 0.00286377
-0.02593034]
...
[ 0.04170844 0.05068012 -0.01590626 ... -0.01107952 -0.04687948
0.01549073]
[-0.04547248 -0.04464164 0.03906215 ... 0.02655962 0.04452837
-0.02593034]
[-0.04547248 -0.04464164 -0.0730303 ... -0.03949338 -0.00421986
0.00306441]]
******target******
[151. 75. 141. 206. 135. 97. 138. 63. 110. 310. 101. 69. 179. 185.
118. 171. 166. 144. 97. 168. 68. 49. 68. 245. 184. 202. 137. 85.
131. 283. 129. 59. 341. 87. 65. 102. 265. 276. 252. 90. 100. 55.
61. 92. 259. 53. 190. 142. 75. 142. 155. 225. 59. 104. 182. 128.
52. 37. 170. 170. 61. 144. 52. 128. 71. 163. 150. 97. 160. 178.
48. 270. 202. 111. 85. 42. 170. 200. 252. 113. 143. 51. 52. 210.
65. 141. 55. 134. 42. 111. 98. 164. 48. 96. 90. 162. 150. 279.
92. 83. 128. 102. 302. 198. 95. 53. 134. 144. 232. 81. 104. 59.
246. 297. 258. 229. 275. 281. 179. 200. 200. 173. 180. 84. 121. 161.
99. 109. 115. 268. 274. 158. 107. 83. 103. 272. 85. 280. 336. 281.
118. 317. 235. 60. 174. 259. 178. 128. 96. 126. 288. 88. 292. 71.
197. 186. 25. 84. 96. 195. 53. 217. 172. 131. 214. 59. 70. 220.
268. 152. 47. 74. 295. 101. 151. 127. 237. 225. 81. 151. 107. 64.
138. 185. 265. 101. 137. 143. 141. 79. 292. 178. 91. 116. 86. 122.
72. 129. 142. 90. 158. 39. 196. 222. 277. 99. 196. 202. 155. 77.
191. 70. 73. 49. 65. 263. 248. 296. 214. 185. 78. 93. 252. 150.
77. 208. 77. 108. 160. 53. 220. 154. 259. 90. 246. 124. 67. 72.
257. 262. 275. 177. 71. 47. 187. 125. 78. 51. 258. 215. 303. 243.
91. 150. 310. 153. 346. 63. 89. 50. 39. 103. 308. 116. 145. 74.
45. 115. 264. 87. 202. 127. 182. 241. 66. 94. 283. 64. 102. 200.
265. 94. 230. 181. 156. 233. 60. 219. 80. 68. 332. 248. 84. 200.
55. 85. 89. 31. 129. 83. 275. 65. 198. 236. 253. 124. 44. 172.
114. 142. 109. 180. 144. 163. 147. 97. 220. 190. 109. 191. 122. 230.
242. 248. 249. 192. 131. 237. 78. 135. 244. 199. 270. 164. 72. 96.
306. 91. 214. 95. 216. 263. 178. 113. 200. 139. 139. 88. 148. 88.
243. 71. 77. 109. 272. 60. 54. 221. 90. 311. 281. 182. 321. 58.
262. 206. 233. 242. 123. 167. 63. 197. 71. 168. 140. 217. 121. 235.
245. 40. 52. 104. 132. 88. 69. 219. 72. 201. 110. 51. 277. 63.
118. 69. 273. 258. 43. 198. 242. 232. 175. 93. 168. 275. 293. 281.
72. 140. 189. 181. 209. 136. 261. 113. 131. 174. 257. 55. 84. 42.
146. 212. 233. 91. 111. 152. 120. 67. 310. 94. 183. 66. 173. 72.
49. 64. 48. 178. 104. 132. 220. 57.]
(442, 10)
(442,)
######X_train######
[[-0.04910502 -0.04464164 -0.05686312 ... -0.03949338 -0.01190068
0.01549073]
[-0.05273755 -0.04464164 -0.05578531 ... 0.03430886 0.13237265
0.00306441]
[-0.09269548 0.05068012 -0.0902753 ... -0.00259226 0.02405258
0.00306441]
...
[ 0.05987114 -0.04464164 -0.02129532 ... 0.07120998 0.07912108
0.13561183]
[-0.07816532 -0.04464164 -0.0730303 ... -0.03949338 -0.01811827
-0.08391984]
[ 0.04170844 0.05068012 0.07139652 ... 0.03430886 0.07341008
0.08590655]]
######X_test######
[[ 0.01991321 0.05068012 0.10480869 ... -0.00259226 0.00371174
0.04034337]
[-0.01277963 -0.04464164 0.06061839 ... 0.03430886 0.0702113
0.00720652]
[ 0.03807591 0.05068012 0.00888341 ... -0.00259226 -0.01811827
0.00720652]
...
[-0.0854304 -0.04464164 -0.00405033 ... -0.03949338 -0.0611766
-0.01350402]
[ 0.03807591 0.05068012 -0.02991782 ... -0.00259226 -0.01290794
0.00306441]
[ 0.04170844 0.05068012 0.01966154 ... -0.00259226 0.03119299
0.00720652]]
######y_train######
[ 68. 109. 94. 118. 275. 275. 127. 281. 71. 42. 71. 128. 272. 135.
51. 220. 167. 78. 131. 212. 182. 174. 259. 77. 91. 310. 84. 134.
102. 128. 306. 245. 201. 183. 111. 96. 125. 182. 177. 48. 97. 259.
288. 242. 69. 31. 154. 150. 52. 261. 118. 102. 139. 51. 58. 144.
178. 97. 78. 129. 258. 124. 198. 185. 66. 237. 178. 275. 268. 242.
200. 214. 246. 236. 85. 114. 93. 99. 72. 270. 111. 83. 87. 42.
172. 65. 259. 279. 141. 144. 220. 90. 101. 53. 67. 72. 121. 303.
232. 140. 190. 221. 71. 116. 111. 280. 233. 78. 150. 283. 64. 140.
65. 225. 206. 63. 296. 173. 85. 141. 50. 25. 153. 55. 139. 336.
73. 95. 109. 44. 180. 263. 148. 79. 65. 102. 220. 277. 246. 200.
262. 191. 97. 184. 85. 248. 150. 268. 59. 70. 88. 100. 190. 113.
66. 243. 185. 262. 48. 160. 217. 210. 132. 257. 104. 126. 292. 166.
83. 81. 144. 281. 72. 39. 109. 60. 258. 178. 168. 87. 77. 216.
206. 142. 161. 265. 60. 200. 265. 272. 146. 94. 55. 69. 138. 258.
143. 172. 89. 69. 199. 55. 45. 265. 91. 170. 55. 202. 155. 77.
77. 71. 123. 84. 252. 52. 40. 274. 143. 245. 92. 151. 39. 235.
92. 253. 94. 81. 346. 90. 181. 162. 277. 152. 178. 124. 75. 263.
202. 200. 108. 96. 60. 72. 107. 54. 158. 152. 220. 308. 249. 222.
65. 173. 88. 72. 164. 52. 115. 200. 90. 248. 37. 230. 63. 273.
61. 53. 189. 241. 118. 252. 104. 219. 115. 332. 131. 185. 63. 131.
88. 187. 196. 59. 341. 109. 101. 113. 80. 242. 168. 128. 233. 209.
225. 83. 214. 96. 129. 47. 229. 293. 74. 202. 164. 202. 59. 91.
120. 151. 310. 90. 116. 147. 43. 42. 48. 134. 84. 71. 64. 70.
310. 311. 122. 243. 248. 91. 281. 142. 295.]
######y_test######
[321. 215. 127. 64. 175. 275. 179. 232. 142. 99. 252. 174. 129. 74.
264. 49. 86. 75. 101. 155. 170. 276. 110. 136. 68. 128. 103. 93.
191. 196. 217. 181. 168. 200. 219. 281. 151. 257. 49. 198. 96. 179.
95. 198. 244. 89. 214. 182. 84. 270. 156. 138. 113. 131. 195. 171.
122. 61. 230. 235. 52. 121. 144. 107. 132. 302. 53. 317. 137. 57.
98. 170. 88. 90. 67. 163. 104. 186. 180. 283. 141. 150. 47. 297.
104. 49. 103. 142. 59. 85. 137. 53. 51. 197. 135. 72. 208. 237.
145. 110. 292. 97. 197. 158. 163. 63. 192. 233. 68. 160. 178.]
######X_train######
(331, 10)
######X_test######
(111, 10)
######y_train######
(331,)
######y_test######
(111,)
Coefficients:[ -43.26774487 -208.67053951 593.39797213 302.89814903 -560.27689824
261.47657106 -8.83343952 135.93715156 703.22658427 28.34844354], intercept 153.06798218266258
Residual sum of squares: 3180.20
Score: 0.36
Coefficients:[ 21.19927911 -60.47711393 302.87575204 179.41206395 8.90911449
-28.8080548 -149.30722541 112.67185758 250.53760873 99.57749017], intercept 152.4477761489962
Residual sum of squares: 3192.33
Score: 0.36
Process finished with exit code 0