Logistic regression in sklearn

In the last article, we briefly introduced the principle of the logistic regression algorithm for binary classification, and learned about the implementation of the logistic regression algorithm through the code. Logistic regression algorithm can only deal with two-class problems in theory. In practical applications, if you need to deal with multi-class problems, there are two ways to improve it. Here is a recommended blog post .


The Logistic regression algorithm is provided in sklearn itself, and it can handle multi-classification problems. Today's article is really simple.

from sklearn import datasets
import numpy as np
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.linear_model import LogisticRegression


iris = datasets.load_iris()
iris_x = iris.data
iris_y = iris.target
x_train, x_test, y_train, y_test = train_test_split(iris_x, iris_y, test_size=0.3, random_state=0)
print(y_train)

classifier = LogisticRegression()
classifier.fit(x_train, y_train)
y_predict = classifier.predict(x_test)
error_index = np.nonzero(y_test - y_predict)[0]
print('error index:', error_index)
error_rate = len(error_index) / len(y_test)
print('error rate:', error_rate)

The code is very short. In the previous part, we still used the iris data set. In the last article about sklearn, we disrupted the data set ourselves, and then divided the training set and test set. This time we call the train_test_split function of sklearn , this function helps us complete the division of the dataset. The first two parameters are nothing, one is the feature data set, the other is the label set, the test_size parameter represents the proportion of the test set to the data set, here is 30%, random_state is the seed number of randomly scrambled data, we assign it to 0, this can To ensure that the data set is shuffled in the same way, everyone will get the same effect as me at runtime.


Next, use the LogisticRegression object for training (fit), then predict (predict), and finally calculate the error rate of the prediction. The running results are as follows:

error index: [16 21 31 35 37]

error rate: 0.1111111111111111


As you can see, using sklearn is simpler, saving us the difficulty of understanding the internal mechanism of the algorithm. However, if we need finer control, it is useful to understand the algorithm mechanism. The construction method of LogisticRegression has many parameters, which can be adjusted to our Logistic regression classifier. For details, please refer to the official documentation .



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325567883&siteId=291194637