sklearn之基础算法模型

0. 模型拥有的通用方法：fit(train_x,train_y) predict(test_x)

1. KNN<实现k近邻投票的分类器> (参数一般只调：n_neighbors，weights，leaf_size，metric)

1 sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=None, **kwargs)

参数解释：

n_neighbors：紧邻数量，默认为5
weights：有三个值uniform、distance、[callable]
uniform：统一的权重。每个邻域中的所有点的权值都是相等的。
distance：权值点与它们的距离成反比。
callable：一个用户定义的函数，它接受一个距离数组，并返回一个包含权值的形状相同的数组
algorithm：该参数表示计算最近邻的算法。ball_tree、kd_tree、brute、auto
ball_tree：使用BallTree算法
kd_tree：使用KDTree
brute：使用brute-force搜索
auto：根据传递给fit方法的值来决定最合适的算法
leaf_size：叶大小，默认30.传递给BallTree或KDTree。这可能会影响构造和查询的速度，以及存储树所需的内存。最优值取决于问题的性质
p：闵可夫斯基距离的P表示默认2。当p=1时，就是曼哈顿距离，当p=2时，就是欧式距离，当p 就是切比雪夫距离
metric：用于树的距离度量。默认的度量是闵可夫斯基距离, P值默认2,即距离度量默认是欧式距离。
n_jobs：表示并行作业数量

2. 决策树(参数一般只调：max_depth,min_samples_leaf)

1 sklearn.tree.DecisionTreeClassifier(criterion=’gini’, splitter=’best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)

参数解释：

criterion：决策树分支算法
splitter:
max_depth:树最大深度
min_samples_split:
min_samples_leaf:
max_features:
random_state:
max_leaf_nodes:
min_impurity_decrease:
min_impurity_split:
class_weight:
min_weight_fraction_leaf:
presort:

3.朴素贝叶斯（参数一般都用默认）

1 sklearn.naive_bayes.GaussianNB(priors=None, var_smoothing=1e-09)

参数解释：

priors：先验概率。如果指定，则不根据数据调整先验
var_smoothing：最大方差部分的所有特征，是增加到方差计算的稳定性

扫描二维码关注公众号，回复： 9556019 查看本文章

4.线性回归（参数一般都用默认）

1 sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=None)

参数解释：

fit_intercept：是否计算该模型的截距。如果设置为False，计算中将不使用截距。
normalize：当fit_intercept设置为False时，将忽略该参数。若为真，则回归前对回归量X进行归一化处理，即减去均值，然后除以L2-范数。如果您希望标准化，请使用sklearn.预处理。在使用normalize=False调用对估计量的拟合之前调用StandardScaler
copy_X：如果为真，则复制X;否则，它可能被覆盖。
n_jobs：表示并行作业数量

5.SVM

1 sklearn.svm.SVC(C=1.0, kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)

参数解释：

C: C-SVC的惩罚参数C?默认值是1.0
C越大，相当于惩罚松弛变量，希望松弛变量接近0，即对误分类的惩罚增大，趋向于对训练集全分对的情况，这样对训练集测试时准确率很高，但泛化能力弱。C值小，对误分类的惩罚减小，允许容错，将他们当成噪声点，泛化能力较强。
kernel:核函数，默认是rbf，可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’
degree:多项式poly函数的维度，默认是3，选择其他核函数时会被忽略。
gamma: ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’，则会选择1/n_features
coef0:核函数的常数项。对于‘poly’和 ‘sigmoid’有用。
shrinking:是否采用shrinking heuristic方法，默认为true
probability:是否采用概率估计？.默认为False
tol:停止训练的误差值大小，默认为1e-3
cache_size:核函数cache缓存大小，默认为200
class_weight:类别的权重，字典形式传递。设置第几类的参数C为weight*C(C-SVC中的C)
verbose:允许冗余输出
max_iter:最大迭代次数。-1为无限制。
decision_function_shape:‘ovo’, ‘ovr’ or None, default=None3
random_state:数据洗牌时的种子值，int值

主要调节的参数有：C、kernel、degree、gamma、coef0。

6.逻辑回归

1 sklearn.linear_model.LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’, max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)

参数解释：

penalty：惩罚系数，默认L2
dual：双重或原始配方。对偶公式只适用于l2罚的线性解算。当n_samples > n_features时，优先选择dual=False
tol：
C：
fit_intercept：指定是否将常数(也称为偏差或截距)添加到决策函数。
intercept_scaling：
class_weight：
random_state：
solver：
max_iter：求解器收敛所需的最大迭代次数。
multi_class：
verbose：
warm_start：
n_jobs：
l1_ratio：

7.神经网络

sklearn之基础算法模型

猜你喜欢