Watermelon Book Study Record_Introduction

Table of Contents of Series Articles


Preface

This series of articles is mainly used to record the author’s learning process of Xigua Shu in the second semester of my junior year.
This article discusses the learning experience and thoughts on certain issues in the preface of Xigua Shu.


1. Knowledge review

After pre-study, class, and review, the introduction chapter can be said to be an extremely friendly introductory manual. In this chapter, the author of Xigua Book systematically introduces some common terms in machine learning, such as samples, markers, samples, etc. Examples, training sets, test sets, etc., and a simple discussion using watermelon as an example. Since these concepts are not difficult to understand, the author will not go into details. The following mainly focuses on the concept of hypothesis space in the book. Explain and illustrate.

Assumption (hypothsis), this concept appears in the fourth paragraph of the second page of the book, and the definition in the book is:

“学得模型对应了关于数据的某种潜在的规律,因此亦称假设”	

The hypothesis space is the space that contains all hypotheses, and the learning process can be regarded as a process of searching in the hypothesis space. The search goal is to find hypotheses that match the training set (P5). In addition, in the real learning process, multiple hypotheses that match the training set may appear, and the set of these hypotheses is the "version space".

From the above summary, the author believes that the abstract concept of "hypothesis" is actually the prediction of possible model situations before learning, and the assumptions contained in the version space are special cases of the assumptions, which are models that can actually be applied, and the models The choice should be further differentiated based on preference.

The above is a broad explanatory hypothesis. In a narrow sense, taking the watermelon in the watermelon book as an example, if there are four samples in the sample space, the details are as follows (P4 Table 1.1)

编号		色泽		根蒂		敲声		好瓜
 1		青绿		蜷缩		浊响		 是
 2		乌黑		蜷缩		浊响		 是
 3		青绿		硬挺		清脆		 否
 4		乌黑		稍蜷		沉闷		 否

The feature vector dimension of each sample is d = 3. Each feature has three different values. Based on these, the scale of the hypothesis space can be calculated.

4 * 4 * 4 = 64
这里不考虑空集,每个特征考虑通配符“*”

Each hypothesis in the hypothesis space can be expressed as:

(色泽 = 青绿;根蒂 = 蜷缩;敲声 = 清脆)

It can also be concluded from the above that the attributes associated with the size of the hypothetical space are the feature dimension and the type of feature, and have nothing to do with the size of the sample space.

笔者在这里犯过错,将假设空间和样本空间的预测集合概念混肴,
所以在此记录,用于警示自己

2. Thinking about after-school exercises

For the after-school exercises, the teacher arranged 1.1 and 1.3 for us in class. 1.1 is very simple, so I won’t go into details. About 1.3, it aroused my thinking

1.3

The topics are as follows:

若数据包含噪声,
则假设空间中有可能不存在与所有训练样本都一致的假设。
在此情形下,试设计一种归纳偏好用于假设选择

The author searched for some existing answers online, which are roughly as follows:

Answer: Choose the hypothesis that satisfies the most samples during training. You can also find the accuracy for each hypothesis. Accuracy rate = (the number of samples that meet the assumed conditions and are good melons) / (the number of samples that meet the assumed conditions). Choose the hypothesis with the highest accuracy.

Another answer: It is generally believed that the closer the attributes of two data are, the more likely they are to be classified into the same category. If the same attribute appears in two different categories, it is considered to belong to the attributes of the data closest to it. You can also consider removing all data with the same attributes but different categories at the same time. The remaining data is error-free data, but some information may be lost.

The author believes that there are two key points in this topic:

  1. Contains noise
  2. There may not be an assumption that all training samples are the same

First of all, for the first point, when selecting hypotheses, the data needs to be cleaned, that is, abnormal points need to be detected, and the abnormal points need to be corrected and completed, so that

For the second point, a conceptual conversion can be made:
there is no assumption that is the same as the training sample <=> the version space is an empty set

Therefore, after noise processing, if the version space is not empty, hypothesis selection will be made according to Occam's razor principle. If it is still empty, the hypothesis with the largest matching training sample size will be selected.


Summarize

The above are some of the author's thoughts on the preface. If there are any mistakes, I would like to ask readers to enlighten me. Thank you for reading.

								  														 			2022.3.2

Guess you like

Origin blog.csdn.net/weixin_45704680/article/details/123240977