100 Days Of ML Code:Day 4/5/6-Logistic Regression(逻辑回归)

100天机器学习挑战汇总文章链接在这儿

目录

数据集的介绍

Step1:数据预处理

Step2:训练(逻辑回归model)

Step3:预测

Step4:评估预测结果

最后:全部代码


具体的内容也可以复习我的这篇文章

上图中部分文字的翻译:

Logistic 回归通常用于不同类别的分类问题,旨在通过观察现有目标类预测所属的类别。通常所给的是离散的二值数据,介于0和1之间。Logistic 回归最常见的例子是在选举期间民众的投票情况。

Logistic 回归模型采用潜在的 logistic 函数得到估计概率值,来衡量独立变量 (通常使我们要预测的标签值) 与一个或多个非独立变量 (我们的特征值) 之间的关系。Logistic 函数,也被称为 sigmoid 函数,这是一个 S 型曲线函数,它能够将估计的概率值转换为二进制值0或1,以便模型预测得到结果。

与线性回归模型得到连续型输出不同,Logistic 回归最终的结果是离散的。

参考文章:http://www.sohu.com/a/244637501_697750

数据集的介绍

This dataset contains information of users in a social network. Those informations are the user id the gender the age and the estimated salary. A car company has just launched their brand new luxury SUV. And we're trying to see which of these users of the social network are going to buy this brand new SUV. And the last column here tells If yes or no the user bought this SUV. We are going to build a model that is going to predict if a user is going to buy or not the SUV based on two variables which are going to be the age and the estimated salary. So our matrix of feature is only going to be these two columns. We want to find some correlations between the age and the estimated salary of a user and his decision to purchase yes or no the SUV.

Step1:数据预处理

import pandas as pd
import numpy as np

df = pd.read_csv('Social_Network_Ads.csv')
# print(df)
X = df.iloc[:, 2:4].values
Y = df.iloc[:, 4].values
# print(X)
# print(Y)

from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0)
# print(X_train)

# feature scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
# print(X_train)
# print(X_test)

Step2:训练(逻辑回归model)

The library for this job which is going to be the linear model library and it is called linear because the logistic regression is a linear classifier which means that here since we're in two dimensions, our two categories of users are going to be separated by a straight line. Then import the logistic regression class. Next we will create a new object from this class which is going to be our classifier that we are going to fit on our training set.

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr = lr.fit(X_train, Y_train)

Step3:预测

Y_pred = lr.predict(X_test)
# print(Y_pred)

Step4:评估预测结果

We predicted the test results and now we will evaluate if our logistic regression model learned and understood correctly. So this confusion matrix is going to contain the correct predictions that our model made on the set as well as the incorrect predictions.

confusion matrix——混淆矩阵

from sklearn.metrics import confusion_matrix
confusion_matrix(Y_test, Y_pred)

最后:全部代码

import pandas as pd
import numpy as np

df = pd.read_csv('Social_Network_Ads.csv')
# print(df)
X = df.iloc[:, 2:4].values
Y = df.iloc[:, 4].values
# print(X)
# print(Y)

from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0)
# print(X_train)

# feature scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
# print(X_train)
# print(X_test)

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr = lr.fit(X_train, Y_train)

Y_pred = lr.predict(X_test)
# print(Y_pred)

from sklearn.metrics import confusion_matrix
confusion_matrix(Y_test, Y_pred)
# print(confusion_matrix(Y_test, Y_pred))

猜你喜欢

转载自blog.csdn.net/m0_37622530/article/details/81476131
今日推荐