Chapter V machine learning neural network

Reference links https://www.cnblogs.com/maybe2030/p/5597716.html
gradient detection
Here Insert Picture Description
Here Insert Picture Description
confirmation is not wrong to turn off the gradient detection, otherwise will become very slow, because he was slower than back-propagation
Here Insert Picture Description
choice nerve architecture input layer and the output layer of the network associated with the category and the number of dimensions of feature. For the hidden layer, if the number of hidden units is greater than 1 should be equal.
Here Insert Picture Description
Here Insert Picture Description
Underfitting: deviation high
overfitting: high variance of
cross-validation: the test set is divided into training set and cross-validation set
is determined way abscissa d is the dimension d is 1 when the training error and errors are large cross. When a lot of time training error d is small
but cross error is large
Here Insert Picture Description
linear regression regularization choice of λ is
Here Insert Picture Description
too θ is 0, a straight line underfitting
If λ is too large, then there will be such as 0 , will cause over-fitting
Here Insert Picture Description
to the regularization λ choice, try different values of λ, the minimum cost function is obtained corresponding to [theta], with a positive test for the theta into cross-validation. And then cross-serious way θ5 selected test set is then brought into the test


Regularization

Here Insert Picture Description
For this graphic, the training set for it, λ from small to large, corresponding small when θ is high-end training therefore set a good fit, there may be too large to fit set so that cross-validation for error. As λ increases, θ smaller, and therefore the equivalent of a term, and then underfitting, this time the error is large, only choose a suitable λ, in order to reduce error in the new data. CROSS error and variations changing training error lamda-.

learning curve

Here Insert Picture Description
Training set learning curve for the training set of the cost and the intersecting curve on the test set for the training set, the less number the better fit, more than the number of error increases. For cross-training set, the greater the number of the original training set, the better his performance

Increase the training effect particles of no use

High deviation (less fit)

Here Insert Picture Description
当交叉训练集随着训练的数目增大的时候,误差不会下降,保持一个较高的稳定的值,这个时候有再多的数据也是没有什么效果的。右图显示的是,如果只有两个参数,那么数据集数据再多,这个直线也是无法拟合的(训练集和交叉验证集随着样本越来越大误差都很大的时候)
这哥时候收集再多数据没有用

高方差(过拟合)

Here Insert Picture Description如果λ十分的小,而且θ十分的大,那么就会过拟合。训练集随着数目的增多,还是会稍微有点误差但是始终很小。但是对于交叉训练集,就是有很大的误差,随着样本数目的下降,稍微有点降低,但是和训练集之间的误差是很大的,随着样本数目的增加,测试集的误差是下降的因此在高方差中,增加样本数量是可行的

对于不同的错误的修复

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
特征向量:对于一个文本,存在的单词特征向量中为1,不存在的为0

对于一个垃圾分类器Here Insert Picture Description

有这些方法进行改进
Here Insert Picture Description
先简单暴力的实现一个最快的算法 然后检测 对不正确的进行分类看 那种问题最多 然后修改原则。

对于一个分类模型:精度和recall 都是重要的检测
精度:预测患了癌症且真的有癌症的人/ 预测的所有患癌症的
recall: 预测且真的患有癌症的人/ 真实中所有得癌症的人
Here Insert Picture Description
这种定义是因为我们希望的真正的患有癌症的人很少的时候。如果y=0 预测一直为0,那么那么recall 是0 是不正确的。 skewed class . 偏斜类。有好的精确度和召回率的算法是好的算法
Here Insert Picture Description
精度和召回率的一个比例。
如果阈值 0.99 那么精度高,但是会漏下许多,因此召回率不高。如果阈值时0.3 ,那么精度较低,因为把很多都判断为患有癌症。 但是召回率高
Here Insert Picture Description
选择一个好的模型,有好的召回率和好的精度。

svm

Here Insert Picture Description
Here Insert Picture Description
通过c的选择可以有更大的间距。 large margin classification
Here Insert Picture Description
θ0=0 绿线是分类线,蓝线是θ,红叉点向上面投影,得到p的长度,如果大于1,那么θ数值要大。
SVM 核函数
Here Insert Picture Description
选择标记点
Here Insert Picture Description
Here Insert Picture Description
svm与核函数
Here Insert Picture Description
Here Insert Picture Description
高斯核函数 线性核函数 莫塞尔定理
Here Insert Picture Description

KNN

降维

pca:找低维平面。找一条直线进行投影,使得投影误差最小。如果是n维到k维,那么就是找k个向量。最小化平方投影
Here Insert Picture Description
数据处理
特征-平均值
如果特征有不同的尺度:特征收缩:特征-平均值/max-min or 特征-平均值/标准差

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
使用PCA 进降维,用最少的k 使得与原来的方差比还是99%
压缩重现:
Here Insert Picture Description
仅仅在测试集上面运用PCA
Here Insert Picture Description
Here Insert Picture Description
pca 不是一个好的防止过拟合的做法。即使你的方差保留到99%,但是仍然存在一些有价值的信息被抛弃的行为。
因此一个好的做法还是进行正则化。pca 提高算法速度比较好
Here Insert Picture Description
只有需要压缩数据的时候使用pca, 不要盲目的使用pca,除非你硬盘 内存不够了才需要压缩数据的时候。

异常检测

Here Insert Picture Description
Evaluation algorithm
Here Insert Picture Description
anomaly detection algorithm
Here Insert Picture Description
anomaly detection algorithm and supervised learning algorithms

through changes to the data to be distributed, feature data distribution is more similar to a Gaussian. Can be carried out by the square root, log, and so on
Here Insert Picture Description
recommendation system, collaborative filtering
Here Insert Picture Description
Here Insert Picture Description
collaborative filtering algorithm
low-rank matrix factorization
Here Insert Picture Description
to find the nearest movie
Here Insert Picture Description
mean standardization
Here Insert Picture Description
normalization algorithm: the use of its movies without any score score
Here Insert Picture Description
batch gradient descent, solve gradient descent the number of too many problems
Here Insert Picture Description
if there are 300 million people,

Published 39 original articles · won praise 6 · views 10000 +

Guess you like

Origin blog.csdn.net/poppyl917/article/details/95587269