Linear classification of breast cancer dataset breastCancer and iris dataset iris by logistic regression and perceptron algorithm

Logistic Regression and Perceptron Algorithms for Linear Classification

The code uses two classification methods, LogisticRegression and Perceptron

# 使用LogisticRegreeion分类器学习和测试
lr = LogisticRegression()
lr.fit(X_train_scaler, y_train)
y_pred_lr = lr.predict(X_test_scaler)

#定义感知机
perceptron = Perceptron(fit_intercept=False, max_iter=200, shuffle=False)

The max_iter in the function can be viewed as The maximum number of passes over the training data through the source code, which refers to the set maximum number of iterations, but it does not mean that the more iterations the better, because more iterations may reduce the accuracy of the model , because overfitting is likely to occur; or the accuracy of the model has not changed despite many iterations, which results in a large amount of calculation due to multiple iterations.

When max_iter is set to 30 the result is

insert image description here

When max_iter is set to 50 the result is

insert image description here

When max_iter is set to 100 the result is

insert image description here

When max_iter is set to 200 the result is

insert image description here
As the number of iterations increases, the accuracy of the model using logisticRegression can be improved a little, but for the perceptron model, the accuracy of the model decreases as the number of iterations increases.

When using all the eigenvalues ​​for training, the accuracy of the two models has been greatly improved. The accuracy of the LogisticRegression model reached 98.2%, while the accuracy of the Perceptron model reached 97.4%, which is compared to only using the 20th. The accuracy of the model obtained by the dimension and the 29th dimension feature has been greatly improved.

insert image description here

The experimental results show that the accuracy of logisticRegression in the two classifiers is higher than that of perceptron, but it cannot be said that perceptron must be inferior to logisticRegression, because they all have their own advantages. They are generally classifier algorithms used for classification problems. Logistic regression uses the output of linear regression as the input of the sigmoid function, and its regression model is more complicated, using a logarithmic cost function, which brings time It costs more, that is, it requires more calculations; while perceptron uses the cost function in the form of mean square error, and its activation function is a step function, which is relatively simple, so it takes a short time, because the calculation is simple and the calculation The amount is small, and the training of the model is fast. Therefore, No free lunch points out here that a large number of calculations are used in exchange for an increase in the accuracy of the model, and the accuracy of the model with a large amount of calculation increases accordingly. This proves the No free lunch principle.

Guess you like

Origin blog.csdn.net/qq_48068259/article/details/127893020