1.logistic回归
logistic回归可以用于概率预测、分类等。
2.sklearn.linear_model.LogisticRegression函数参数
LogisticRegression
(
penalty=’l2’
,
dual=False
,
tol=0.0001
,
C=1.0
,
fit_intercept=True
,
intercept_scaling=1
,
class_weight=None
,
rando
注意:
参数说明:
- penalty : str, ‘l1’ or ‘l2’, default: ‘l2’。用来指明惩罚的标准,The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers仅支持l2 penalties.
- dual : bool, default: False。Dual or primal formulation.Dual formulation只适用于 l2 penalty with liblinear solver.Prefer dual=False when n_samples > n_features.
- tol : float, default: 1-4。Tolerance for stopping criteria.
- C : float, default: 1.0。逆正则化的强度,一定要是整数,就像支持向量机一样,较小的值有职责更好的正则化。
- fit_intercept : bool, default: True。是否存在截距,默认存在。
- class_weight : dict or ‘balanced’, default: None。用于标示分类模型中各种类型的权重,可以不输入,即不考虑权重。如果输入的话可以调用balanced库计算权重,或者是手动输入各类的权重。比如对于0,1的二元模型,我们可以定义class_weight={0:0.9, 1:0.1},这样类型0的权重为90%,而类型1的权重为10%。
- random_state:随机数种子,默认为无,仅在正则化优化算法为sag,liblinear时有用。
- max_iter : int, default: 100。Useful only for the newton-cg, sag and lbfgs solvers,求解的最大迭代次数
- multi_class : str, {‘ovr’, ‘multinomial’}, default: ‘ovr’。多类别问题的处理方式。'ovo':一对一
- verbose :日志冗长度int:冗长度;0:不输出训练过程;1:偶尔输出; >1:对每个子模型都输出
- warm_start : bool, default: False。是否热启动,如果是,则下一次训练是以追加树的形式进行(重新使用上一次的调用作为初始化),bool:热启动,False:默认值
- n_jobs:并行数,int:个数;-1:跟CPU核数一致;1:默认值
- solver : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’},default: ‘liblinear’ Algorithm to use in the optimization problem.
- 对于小的数据集,, 选择 ‘liblinear’较好,‘sag’ 和‘saga’ 对于大数据集更好;
- 对于多级分类的问题,只有‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’,libniear只支持多元逻辑回归的OvR,不支持MvM,但MVM相对精确。
- newton-cg’, ‘lbfgs’ and ‘sag’ only handle L2 penalty, 相反地‘liblinear’ and ‘saga’ handle L1 penalty.
3.Attibutes
- coef_ : 变量中的系数。shape (1, n_features) or (n_classes, n_features)
- intercept_ :截距。shape (1,) or (n_classes,)
- n_iter_ :所有类的实际迭代次数。shape (n_classes,) or (1, )
4.Methods
- (1)
decision_function
(X):预测样本的 confidence scores - (2)
densify
():将系数矩阵转化成密集矩阵的格式 - (3)
fit
(X, y[, sample_weight]):根据给出的训练数据来训练模型。用来训练LR分类器,其中X是训练样本,y是对应的标记样本。 - (4)
get_params
([deep]):Get parameters for this estimator. - (5)
predict
(X):用来预测测试样本的标记,也就是分类。预测x的标签 - (6)
predict_log_proba
(X):对数概率估计 - (7)
predict_proba
(X):概率估计 - (8)
score
(X, y[, sample_weight]):返回给定的测试数据和标签的平均精度 - (9)
set_params
(**params):设置estimate的参数 - (10)sparsify():将系数矩阵转换成稀疏矩阵格式。
5.Methods的参数
(1)fit
(X, y, sample_weight=None)
Fit the model according to the given training data.
Parameters: | X : {array-like, sparse matrix}, shape (n_samples, n_features) 训练样本
y : array-like, shape (n_samples,)
sample_weight : array-like, shape (n_samples,) optional
|
---|---|
Returns: | self : object
|
(2)get_params
(
deep=True
)
[source]
Get parameters for this estimator.
Parameters: | deep : boolean, optional
|
---|---|
Returns: | params : mapping of string to any
|
(3)
predict
(X) Predict class labels for samples in X.
Parameters: | X : {array-like, sparse matrix}, shape = [n_samples, n_features]
|
---|---|
Returns: | C : array, shape = [n_samples]
|
predict_log_proba
(X) Log of probability estimates.
The returned estimates for all classes are ordered by the label of classes.
Parameters: | X : array-like, shape = [n_samples, n_features] |
---|---|
Returns: | T : array-like, shape = [n_samples, n_classes]
|
predict_proba
(
X
) Probability estimates.
The returned estimates for all classes are ordered by the label of classes.
For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.
Parameters: | X : array-like, shape = [n_samples, n_features] |
---|---|
Returns: | T : array-like, shape = [n_samples, n_classes]
|
score
(X, y, sample_weight=None):[source]
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: | X : array-like, shape = (n_samples, n_features)
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
sample_weight : array-like, shape = [n_samples], optional
|
---|---|
Returns: | score : float
|