Machine Learning: CHANG: generate a probability distribution probability model

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/qq_45056216/article/details/102770721

1. probability distribution

Because learning is a return to the front, so we have to see through the regression of the probability distribution of the
classification is class 1 when the result is a
classified as class 2, when the result is -1;
test, if the result is close to 1 class1, if the result It is close to -1 class2.
Here Insert Picture Description
But then, it just looks very beautiful, but if and when the result is much greater than 1 when he should be classified class1 or class2 it? In order to reduce the overall error we need to adjust the classification function has been found, this will lead to inaccurate actual results.
Here Insert Picture Description
So for the probability distribution of the type of subject that we can not use regression to solve.

Machine learning still follow the three steps:
1. Design a model
we use binary CHANG according to the teacher's video, for example:
Here Insert Picture Description
2.loss function (loss of function) Here Insert Picture Description
3. Find the best function
to use (perceptorn, SVM) two kinds method.

2. probability generation model

根据李宏毅老师的视频中我们设计了两个盒子box1和box2 假设我们重盒子里面拿出来的蓝球为p(b1) = 2/3 , 则绿球的概率p(b2) = 1/3 假设p(b1|x)>0.5说明x属于box1;反之则属于box2。
Here Insert Picture Description
因此盒子来自box1的概率为:
Here Insert Picture Description
其中这里面涉及有关数论的知识,由于本身没有学过,所以里面有些内容不是很理解。
我们假设这种概率分布模型是高斯分布(因为是最常见的分布类型),根据概率论中的中心极限定理告诉我们答案,所以我们选择的高斯分布。
Here Insert Picture Description
相关理解可以查阅网站:

关于多维度的高斯分布(正态分布)

其中均值为μ,协方差为∑(用来表示一组数据的波动大小的)
根据李宏毅老师的视频中我们假设有79组宝可梦数据,因此:Here Insert Picture Description
我们计算得出μ和∑的值

3.解决分类问题

(李宏毅老师的例子)
开始我们的分类问题:
我们要进行二分类,分别是水系的怪物精灵和一般的怪物精灵,我们计算得到他们的高斯分布分别为
Here Insert Picture Description
我们就可以用第一部分的概率分布公式计算x的分类了,水系p(C1),非水系p(C2)分别在数据中就可以简单计算,p(x∣C1),p(x∣C2)由它们概率密度函数推导求解得到(积分)。If P (C1 | x)> 0.5, x is described aqueous
如果P(C1|x)>0.5,说明x属于水系。

但是得到的结果的正确率只有54%。
分析一下原因,是由于两类额协方差导致参数过多,那我们让协方差共享∑\sum∑,减少协方差的种类。
Here Insert Picture Description
这样正确率就达到了73%Here Insert Picture Description
本图片,公式均引用自李宏毅老师的机器学习。
以上是我对李宏毅老师视频学习的笔记记录。

Guess you like

Origin blog.csdn.net/qq_45056216/article/details/102770721