Benign / Malignant breast tumors prediction data described
(1) the data source .
(2) 699 samples, a total of 11 data, the first column of search terms id, 9, respectively, after column medical characteristics associated with a tumor, a numerical value indicating the last tumor types.
(3) contains 16 missing values, use the "?" Mark.
def logistic():
"""
逻辑回归做二分类进行癌症预测(根据细胞的属性特征)
:return: NOne
"""
# 构造列标签名字
column = ['Sample code number','Clump Thickness', 'Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']
# 读取数据
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data", names=column)
print(data)
# 缺失值进行处理
data = data.replace(to_replace='?', value=np.nan)
data = data.dropna()
# 进行数据的分割
x_train, x_test, y_train, y_test = train_test_split(data[column[1:10]], data[column[10]], test_size=0.25)
# 进行标准化处理
std = StandardScaler()
x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)
# 逻辑回归预测
lg = LogisticRegression(C=1.0) # 正则化力度
lg.fit(x_train, y_train)
print(lg.coef_)
y_predict = lg.predict(x_test)
print("准确率:", lg.score(x_test, y_test))
print("召回率:", classification_report(y_test, y_predict, labels=[2, 4], target_names=["良性", "恶性"]))
return None
if __name__ == "__main__":
logistic()
Sample code number Clump Thickness Uniformity of Cell Size \
0 1000025 5 1
1 1002945 5 4
2 1015425 3 1
3 1016277 6 8
4 1017023 4 1
5 1017122 8 10
6 1018099 1 1
7 1018561 2 1
8 1033078 2 1
9 1033078 4 2
10 1035283 1 1
11 1036172 2 1
12 1041801 5 3
13 1043999 1 1
14 1044572 8 7
15 1047630 7 4
16 1048672 4 1
17 1049815 4 1
18 1050670 10 7
19 1050718 6 1
20 1054590 7 3
21 1054593 10 5
22 1056784 3 1
23 1057013 8 4
24 1059552 1 1
25 1065726 5 2
26 1066373 3 2
27 1066979 5 1
28 1067444 2 1
29 1070935 1 1
.. ... ... ...
669 1350423 5 10
670 1352848 3 10
671 1353092 3 2
672 1354840 2 1
673 1354840 5 3
674 1355260 1 1
675 1365075 4 1
676 1365328 1 1
677 1368267 5 1
678 1368273 1 1
679 1368882 2 1
680 1369821 10 10
681 1371026 5 10
682 1371920 5 1
683 466906 1 1
684 466906 1 1
685 534555 1 1
686 536708 1 1
687 566346 3 1
688 603148 4 1
689 654546 1 1
690 654546 1 1
691 695091 5 10
692 714039 3 1
693 763235 3 1
694 776715 3 1
695 841769 2 1
696 888820 5 10
697 897471 4 8
698 897471 4 8
Uniformity of Cell Shape Marginal Adhesion Single Epithelial Cell Size \
0 1 1 2
1 4 5 7
2 1 1 2
3 8 1 3
4 1 3 2
5 10 8 7
6 1 1 2
7 2 1 2
8 1 1 2
9 1 1 2
10 1 1 1
11 1 1 2
12 3 3 2
13 1 1 2
14 5 10 7
15 6 4 6
16 1 1 2
17 1 1 2
18 7 6 4
19 1 1 2
20 2 10 5
21 5 3 6
22 1 1 2
23 5 1 2
24 1 1 2
25 3 4 2
26 1 1 1
27 1 1 2
28 1 1 2
29 3 1 2
.. ... ... ...
669 10 8 5
670 7 8 5
671 1 2 2
672 1 1 2
673 2 1 3
674 1 1 2
675 4 1 2
676 2 1 2
677 1 1 2
678 1 1 2
679 1 1 2
680 10 10 5
681 10 10 4
682 1 1 2
683 1 1 2
684 1 1 2
685 1 1 2
686 1 1 2
687 1 1 2
688 1 1 2
689 1 1 2
690 1 3 2
691 10 5 4
692 1 1 2
693 1 1 2
694 1 1 3
695 1 1 2
696 10 3 7
697 6 4 3
698 8 5 4
Bare Nuclei Bland Chromatin Normal Nucleoli Mitoses Class
0 1 3 1 1 2
1 10 3 2 1 2
2 2 3 1 1 2
3 4 3 7 1 2
4 1 3 1 1 2
5 10 9 7 1 4
6 10 3 1 1 2
7 1 3 1 1 2
8 1 1 1 5 2
9 1 2 1 1 2
10 1 3 1 1 2
11 1 2 1 1 2
12 3 4 4 1 4
13 3 3 1 1 2
14 9 5 5 4 4
15 1 4 3 1 4
16 1 2 1 1 2
17 1 3 1 1 2
18 10 4 1 2 4
19 1 3 1 1 2
20 10 5 4 4 4
21 7 7 10 1 4
22 1 2 1 1 2
23 ? 7 3 1 4
24 1 3 1 1 2
25 7 3 6 1 4
26 1 2 1 1 2
27 1 2 1 1 2
28 1 2 1 1 2
29 1 1 1 1 2
.. ... ... ... ... ...
669 5 7 10 1 4
670 8 7 4 1 4
671 1 3 1 1 2
672 1 3 1 1 2
673 1 1 1 1 2
674 1 2 1 1 2
675 1 1 1 1 2
676 1 2 1 1 2
677 1 1 1 1 2
678 1 1 1 1 2
679 1 1 1 1 2
680 10 10 10 7 4
681 10 5 6 3 4
682 1 3 2 1 2
683 1 1 1 1 2
684 1 1 1 1 2
685 1 1 1 1 2
686 1 1 1 1 2
687 1 2 3 1 2
688 1 1 1 1 2
689 1 1 1 8 2
690 1 1 1 1 2
691 5 4 4 1 4
692 1 1 1 1 2
693 1 2 1 2 2
694 2 1 1 1 2
695 1 1 1 1 2
696 3 8 10 2 4
697 4 10 6 1 4
698 5 10 4 1 4
[699 rows x 11 columns]
[[1.43903602 0.36918726 0.75945108 1.26904696 0.0910426 1.21706346
1.3565616 0.37057748 0.67245559]]
准确率: 0.9590643274853801
召回率: precision recall f1-score support
良性 0.98 0.96 0.97 116
恶性 0.91 0.96 0.94 55
avg / total 0.96 0.96 0.96 171