Understand the basic knowledge of machine learning in the paper (4) --bootstrap

           In the paper I read in TLD, after I have a certain understanding of the semi-supervised algorithm, I can understand a little bit about the derivation of PN learning at the beginning of Part 4, but there is a chapter on Relation to supervised bootstrap in the second part. , Bootstrap often appears, but I have not learned pattern recognition, so I am very vague about this concept, so I checked it, but there is very little information, and finally combined with Section 6.2 of "Machine Learning A Probabilistic Perspective", I finally understand it a bit. If you have good information, please advise.

         

Bootstrapping literally translates to bootstrapping , and from its content translation is also called self-help method, which is a statistical method of resampling . The name of the self-help method comes from the English phrase "to pull oneself up by one's bootstrap" , which means to do something that cannot be done naturally. In 1977, Efron, a professor of statistics at Stanford University in the United States, proposed a new statistical method for augmented samples, the Bootstrap method, which provided a good idea for solving the evaluation problem of small samples.
1. The basic idea of ​​self-help law :
If the population distribution is not known, then the best guess for the population distribution is the distribution provided by the data . The main points of the bootstrap method are: ① Assume that the observed value is the population; ② Draw a sample from this assumed population, that is, resampling. The samples with the same content as the original data set obtained by resampling the original data are called resamples or bootstrap samples . If the statistic computed from the original dataset is called the observed statistic , then the statistic computed from the resampled sample is called the bootstrap statistic . The key to the bootstrap method is the relationship between the bootstrap statistic and the observed statistic, just like the relationship between the observed statistic and the true value, which can be expressed as:
Bootstrap statistic::observed statistic <=> observed statistic::true value
Among them, "::" indicates the relationship between the two, "<=>" indicates equivalent. That is, through the study of bootstrap statistics, one can learn about the deviation of the observed statistic from the true value.
The resampling is a sampling with replacement method. Assuming there are n observations, the bootstrap sample can be obtained as follows:
①Write each observation on a paper label;
② Put all the paper tags in a box;
        ③ Mix well. Take a piece of paper and write down the observations on it;
        ④ Put it back into the box, mix well, and draw again;
        ⑤ Repeat steps ③ and ④ n times to get a self-help sample. Repeat the above sampling process B times to obtain B self-help samples. (Quoting from a paper by Mr. Liu Wenzhong, I feel that it is easier to understand).

2. Mathematical expression of Bootstrap



其中等号上面一个小三角号表示定义。


上面的截图是从知网上的刘伟的一篇论文截的。那么经过上面的讲解后大家应该知道bootstrap是什么了,其实不用搞得多么高深害怕,就是一个在自身样本重采样的方法来估计真实分布的问题,以后在机器学习或者其它算法中碰到这个词的话,可以恰当的联想下,可能会对其它的算法有一定的帮助。有错请大家指正,引用的论文中的内容有冒犯请见谅。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324167671&siteId=291194637