Machine Learning Common Algorithms

I've been learning machine learning recently, and the online summary is very good, so I'll throw some ideas here:

1. Decision tree

Classify according to some features, each node asks a question, and through judgment, the data is divided into two categories, and then continue to ask questions. These problems are learned based on existing data, and when new data is input, the data can be divided into appropriate leaves according to the problems on the tree.

2. Random Forest Algorithm


Randomly select data from the source data and form several subsets according to the video explanation


The S matrix is ​​the source data, with 1-N pieces of data, ABC is the feature, and the last column C is the category


Randomly generate M submatrices from S


Get M decision trees from these M subsets,
put new data into these M trees, get M classification results, count to see which category has the largest number, and use this category as the final prediction result


3. Logistic regression

Video explanation

When the prediction target is probability, the value range needs to be greater than or equal to 0 and less than or equal to 1. At this time, a simple linear model cannot do it, because when the definition domain is not within a certain range, the value range also exceeds specified interval.


Therefore, it is better to have a model with such a shape at this time.


So how do you get such a model?
This model needs to meet two conditions greater than or equal to 0, less than or equal to 1 and
greater than or equal to 0. The model can choose the absolute value, the square value, the exponential function is used here, it must be greater than 0 and
less than or equal to 1 Use division, the numerator is itself, the denominator is itself plus 1, it must be less than 1


Do a little more deformation, and you get the logistic regression model


The corresponding coefficients can be obtained by calculating the source data.


Finally get the logistic graph


4 、 SVM

Video explanation

To separate the two types and want to get a hyperplane, the optimal hyperplane is to maximize the margin of the two types, and the margin is the distance between the hyperplane and the closest point to it, as shown in the figure below, Z2>Z1, so the green hyperplane Flat is better


Express this hyperplane as a linear equation, one class above the line is greater than or equal to 1, and the other class is less than or equal to -1


The distance from the point to the surface is calculated according to the formula in the figure


所以得到 total margin 的表达式如下,目标是最大化这个 margin,就需要最小化分母,于是变成了一个优化问题


举个栗子,三个点,找到最优的超平面,定义了 weight vector=(2,3)-(1,1)


得到 weight vector 为(a,2a),将两个点代入方程,代入(2,3)另其值=1,代入(1,1)另其值=-1,求解出 a 和 截矩 w0 的值,进而得到超平面的表达式。


a 求出来后,代入(a,2a)得到的就是 support vector
a 和 w0 代入超平面的方程就是 support vector machine

5、朴素贝叶斯

视频讲解

举个在 NLP 的应用:给一段文字,返回情感分类,这段文字的态度是positive,还是negative


为了解决这个问题,可以只看其中的一些单词


这段文字,将仅由一些单词和它们的计数代表


原始问题是:给你一句话,它属于哪一类。通过 bayes rules 变成一个比较简单容易求得的问题


问题变成,这一类中这句话出现的概率是多少,当然,别忘了公式里的另外两个概率
栗子:单词 love 在 positive 的情况下出现的概率是 0.1,在 negative 的情况下出现的概率是 0.001

6、K最近邻算法

视频讲解

给一个新的数据时,离它最近的 k 个点中,哪个类别多,这个数据就属于哪一类
栗子:要区分 猫 和 狗,通过 claws 和 sound 两个feature来判断的话,圆形和三角形是已知分类的了,那么这个 star 代表的是哪一类呢


k=3时,这三条线链接的点就是最近的三个点,那么圆形多一些,所以这个star就是属于猫


7、K均值算法

视频讲解

想要将一组数据,分为三类,粉色数值大,黄色数值小
最开心先初始化,这里面选了最简单的 3,2,1 作为各类的初始值
剩下的数据里,每个都与三个初始值计算距离,然后归类到离它最近的初始值所在类别


分好类后,计算每一类的平均值,作为新一轮的中心点


几轮之后,分组不再变化了,就可以停止了



8、Adaboost 算法

视频讲解

adaboost 是 bosting 的方法之一
bosting就是把若干个分类效果并不好的分类器综合起来考虑,会得到一个效果比较好的分类器。
下图,左右两个决策树,单个看是效果不怎么好的,但是把同样的数据投入进去,把两个结果加起来考虑,就会增加可信度


adaboost 的栗子,手写识别中,在画板上可以抓取到很多 features,例如 始点的方向,始点和终点的距离等等

training 的时候,会得到每个 feature 的 weight,例如 2 和 3 的开头部分很像,这个 feature 对分类起到的作用很小,它的权重也就会较小


而这个 alpha 角 就具有很强的识别性,这个 feature 的权重就会较大,最后的预测结果是综合考虑这些 feature 的结果

9、神经网络

视频讲解

Neural Networks 适合一个input可能落入至少两个类别里
NN 由若干层神经元,和它们之间的联系组成
第一层是 input 层,最后一层是 output 层
在 hidden 层 和 output 层都有自己的 classifier


input 输入到网络中,被激活,计算的分数被传递到下一层,激活后面的神经层,最后output 层的节点上的分数代表属于各类的分数,下图例子得到分类结果为 class 1
同样的 input 被传输到不同的节点上,之所以会得到不同的结果是因为各自节点有不同的weights 和 bias
这也就是 forward propagation


10、马尔可夫

视频讲解

Markov Chains 由 state 和 transitions 组成
栗子,根据这一句话 ‘the quick brown fox jumps over the lazy dog’,要得到 markov chain
步骤,先给每一个单词设定成一个状态,然后计算状态间转换的概率


这是一句话计算出来的概率,当你用大量文本去做统计的时候,会得到更大的状态转移矩阵,例如 the 后面可以连接的单词,及相应的概率


生活中,键盘输入法的备选结果也是一样的原理,模型会更高级

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326911037&siteId=291194637