参数详解

from sklearn import linear_model
linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0,
                                fit_intercept=True, intercept_scaling=1,
                                class_weight=None, random_state=None,
                                solver='lbfgs', max_iter=100, multi_class='auto',
                                verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)

penalty参数:

str, ‘l1’ or ‘l2’, default: ‘l2’ ；即选择正则化参数，str类型，可选L1和L2正则化，默认是L2正则化。
dual参数: bool, default: False。即选择对偶公式(dual)或原始公式(primal)，bool类型默认是原始公式，当样本数大于特征数时，更倾向于原始公式，即 False。

tol参数:

float, default: 1e-4。对停止标准的容忍，即求解到多少的时候认为已经求得最优解，并停止。float类型，默认值为1e-4。

C参数:

float, default: 1.0。设置正则化强度的逆，值越小，正则化越强。float类型，默认值为1.0。
fit_intercept参数: bool, default: True。即选择是否将偏差(也称截距)添加到决策函数中。bool类型，默认为True，添加。

intercept_scaling参数:

float, default 1；只在solver选择liblinear并且self.fit_intercept设置为True的时候才有用。float类型在这种情况下x变为[[x,self.intercept_scaling]]

class_weight参数:

dict or ‘balanced’, default: None。即类型权重参数，用于标示分类模型中各种类型的权重。可以用字典模式输入也可选择‘balanced’模式，默认是None，即所有类型的权重都一样。可以通过直接输入{class_label: weight}来对每个类别权重进行赋值，如{0:0.3,1:0.7}就是指类型0的权重为30%，而类型1的权重为70%。也可以通过选择‘balanced’来自动计算类型权重，实际计算公式为： n_samples / (n_classes * np.bincount(y))，当然这是不需要我们计算。在出现误分类代价很高或类别不平衡的情况下，可以通过这个参数来在出现误分类代价很高或类别不平衡的情况下，我们可以通过这个参数来调整权重。调整权重。
1、误分类代价很高：比如说对合法用户和非法用户进行分类，将非法用户分1、误分类代价很高：比如说对合法用户和非法用户进行分类，将非法用户分类为合法用户的代价很高，我们宁愿将合法用户分类为非法用户，这时可类为合法用户的代价很高，我们宁愿将合法用户分类为非法用户，这时可以人工再甄别，但是却不愿将非法用户分类为合法用户。这时，我们可以以人工再甄别，但是却不愿将非法用户分类为合法用户。这时，我们可以适当提高非法用户的权重。适当提高非法用户的权重。
2、类别不平衡：分类任务中不同类别的训练样例数目差别很大，这种情况2、类别不平衡：分类任务中不同类别的训练样例数目差别很大，这种情况提出“再缩放”的基本策略。
random_state参数: int, RandomState instance or None, optional, default: None。随机数种子。仅在solver为‘sag’或者‘liblinear’时使用。int类型，默认为无。

solver参数:

{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, 即选择逻辑回归损失函数优化算法的参数。默认情况是使用‘liblinear’算法。对于小型数据集来说，选择‘liblinear’更好；对于大型数据集来说，‘saga’或者‘sag’会更快一些。对于多类问题我们只能使用‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’。对于正则化来说，‘newton-cg’,‘lbfgs’和‘sag’只能用于L2正则化(因为这些优化算法都需要损失函数的一阶或者二阶连续导数，因此无法用于没有连续导数的L1正则化)；而‘liblinear’,‘saga’则可处理L1正则化。‘newton-cg’是牛顿家族中的共轭梯度法，‘lbfgs’是一种拟牛顿法，‘sag’则是随机平均梯度下降法，‘saga’是随机优化算法，‘liblinear’是坐标轴下降法。

max_iter参数:

int, default: 100。算法收敛最大迭代次数。只在solver为‘newton-cg’，‘sag’和‘lbfgs’时有用。

multi_class参数:

str, {‘ovr’, ‘multinomial’}, default: ‘ovr’。即选择分类方式的参数，可选参数有‘ovr’和‘multinomial’， str类型，默认为‘ovr’。‘ovr’即one-vs-rest(OvR)，而‘multinomial’即many-vs-many(MvM)。OvR每次将一个类的样例作为正例，所有其他类的样例作为反例来训练，在测试中若仅有一个分类器预测为正类，则对应的类别标记为最终结果；若有多个分类器预测为正类，则考虑每个分类器的置信度，置信度高的若有多个分类器预测为正类，MvM则是每次将若干类作为正例，若干其他类作为反例MvM则是每次将若干类作为正例，若干其他类作为反例

verbose参数:

int, default: 0。日志冗长度；对于solver为‘liblinear’和‘lbfgs’将详细数字设置为任意正数以表示详细。

warm_start参数:

bool, default: False。热启动参数；bool类型，默认为False。当设置为True时，重用上一次调用的解决方案作为初始化，否则，只需删除前面的解决方案。对于‘liblinear’没用。

n_jobs参数:

int, default: 1。并行数，代表CPU的一个内核运行程序。

l1_ratio参数:

float or None, optional (default=None). The Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1.
Only used if penalty=‘elasticnet’`. Setting ``l1_ratio=0 is equivalent to using penalty=‘l2’,
while setting l1_ratio=1 is equivalent to using penalty=‘l1’.
For 0 < l1_ratio <1, the penalty is a combination of L1 and L2.

属性和方法

属性

coef_ :
array, shape (n_classes, n_features)
Coefficient of the features in the decision function.

intercept_ :
array, shape (n_classes,)
Intercept (a.k.a. bias) added to the decision function. If fit_intercept is set to False, the intercept is set to zero.

n_iter_ :
int
Maximum of the actual number of iterations across all classes. Valid only for the liblinear solver.

方法

decision_function(X)

Predict confidence scores for samples。预测X的置信分数

densify()

Convert coefficient matrix to dense array format. 将系数矩阵转化成稠密数组形式

fit(X, y)

Fit the model according to the given training data. 训练模型

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator. 获取模型参数

predict(X)

Predict class labels for samples in X. 预测标签

predict_log_proba(X)

Log of probability estimates. 对数概率估计

predict_proba(X)

Probability estimates. 对X的每条样本进行概率估计

score(X, y[, sample_weight])

Returns the mean accuracy on the given test data and labels. 返回给定的测试数据的平均正确率

set_params(**params)

Set the parameters of this estimator. 设置模型参数

sparsify()

Convert coefficient matrix to sparse format. 系数矩阵稀疏化

transform(X[, threshold])

Reduce X to its most important features. Uses coef_ or feature_importances_ to determine the most important features. 对X降维（依据coef_ 和 feature_importances_进行判断）

predict 是怎么预测标签的？

首先看源码：

def predict(self, X):
    """Predict class labels for samples in X.
    Parameters
    ----------
    X : {array-like, sparse matrix}, shape = [n_samples, n_features]
        Samples.
    Returns
    -------
    C : array, shape = [n_samples]
        Predicted class label per sample.
    """
    scores = self.decision_function(X)   # 这里调用了 decision_function 函数，实际上是根据训练出来的系数和截距得到一个回归值
    if len(scores.shape) == 1:           # 如果 score 是一维的
        indices = (scores > 0).astype(np.int)  # 如果 score 回归值 >0,就把该位置1，否则置0
    else:                                # 如果是多维的 例如 score=[[0.1, 0.5, 0.4], [0.6, 0.2, 0.2], [0.1, 0.2, 0.7]]
        indices = scores.argmax(axis=1)  # 去每行最大值的列坐标，即 [1, 2, 0]
    return self.classes_[indices]        # 取标签，假如标签 classes_=[1, 2, 3],则classes_[[1, 2, 0]] = [2, 3, 1],即预测输出的标签

predict用到了一个很关键的函数 decision_function()，看这个函数的源码：

def decision_function(self, X):
    """Predict confidence scores for samples.
    The confidence score for a sample is the signed distance of that
    sample to the hyperplane.
    Parameters
    ----------
    X : {array-like, sparse matrix}, shape = (n_samples, n_features)
        Samples.
    Returns
    -------
    array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
        Confidence scores per (sample, class) combination. In the binary
        case, confidence score for self.classes_[1] where >0 means this
        class would be predicted.
    """
    if not hasattr(self, 'coef_') or self.coef_ is None:         # 判断是否有此属性
        raise NotFittedError("This %(name)s instance is not fitted "
                             "yet" % {'name': type(self).__name__})

    X = check_array(X, accept_sparse='csr')    # 检测输入的X数据是否是数值型的，

    n_features = self.coef_.shape[1]  # 取特性数量
    if X.shape[1] != n_features:     # 如果X列数与特征数不一致，报错
        raise ValueError("X has %d features per sample; expecting %d"
                         % (X.shape[1], n_features))
	# 计算 X与系数的内积，再加上截距，得出一个回归值
    scores = safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_
    return scores.ravel() if scores.shape[1] == 1 else scores # 如果这个回归值是二维，就将其转换成一维的

很明显，LogisticRegression模型预测标签的方法是，先通过训练出来的特征系数和截距，对输入的数据计算回归值，根据回归值来判断样本标签。如果是二分类问题，那么直接通过这个回归值是否大于0来判断正负类；如果是多分类问题，则取回归值最大的一个对应的标签。

predict_proba 是怎么预测概率的？

话不多说，上源码：

def predict_proba(self, X):
    """Probability estimates.

    The returned estimates for all classes are ordered by the
    label of classes.

    For a multi_class problem, if multi_class is set to be "multinomial"
    the softmax function is used to find the predicted probability of
    each class.
    Else use a one-vs-rest approach, i.e calculate the probability
    of each class assuming it to be positive using the logistic function.
    and normalize these values across all the classes.

    Parameters
    ----------
    X : array-like, shape = [n_samples, n_features]

    Returns
    -------
    T : array-like, shape = [n_samples, n_classes]
        Returns the probability of the sample for each class in the model,
        where classes are ordered as they are in ``self.classes_``.
    """
    if not hasattr(self, "coef_"):
        raise NotFittedError("Call fit before prediction")
    # 判断参数
    ovr = (self.multi_class in ["ovr", "warn"] or
           (self.multi_class == 'auto' and (self.classes_.size <= 2 or
                                            self.solver == 'liblinear')))
    if ovr:         
        return super()._predict_proba_lr(X)  # 一般来说，二分类问题，且multi_class 默认的话，就进入此
    else:
        decision = self.decision_function(X)
        if decision.ndim == 1:
            # Workaround for multi_class="multinomial" and binary outcomes
            # which requires softmax prediction with only a 1D decision.
            decision_2d = np.c_[-decision, decision]
        else:
            decision_2d = decision
        return softmax(decision_2d, copy=False)  # 进行个softmax转化

寻根究底：

def _predict_proba_lr(self, X):
    """Probability estimation for OvR logistic regression.

    Positive class probabilities are computed as
    1. / (1. + np.exp(-self.decision_function(X)));
    multiclass is handled by normalizing that over all classes.
    """
    prob = self.decision_function(X)  # 这里还是用到了 decision_function 函数，计算回归值
    expit(prob, out=prob)    #  expit(x) = 1/(1+exp(-x)) sigmoid函数，将回归值转化成概率
    if prob.ndim == 1:       # 如果是一维，则转化成
        return np.vstack([1 - prob, prob]).T
    else:
        # OvR normalization, like LibLinear's predict_probability
        prob /= prob.sum(axis=1).reshape((prob.shape[0], -1)) # 多维的情况下，每行归一化
        return prob

可以看出，用predict与用predict_proba预测标签是一直的，都是先通过 decision_function 得到回归值，
predict 中，回归值>0，则为正类，等同于 predict_proba 中概率>0。毕竟，
$expit(x) = \frac{1}{1+e^{-x}};当 x >= 0 时，expit(x) >= 0.5.$

月上流骚头

发布了2 篇原创文章 · 获赞 2 · 访问量 206

私信关注

sklearn.linear_model.LogisticRegression模型参数详解与predict、predict_proba源码解析