Concept study self-study notes

Concept study self-study notes

Concept learning:

The grasp of the following points will help us better understand the concept of learning

  • The partial order relationship in discrete mathematics is the key to a better understanding of the FIND_S algorithm and the candidate elimination algorithm.

  • To understand from the perspective of search problems

  • Concept learning has poor performance when the training set contains noisy data

Terminology and symbolic representation

  • Target concept c: is a Boolean function h: X→{0,1}

  • Target concept value: c(x)

  • Positive example: c(x)=1

  • Counter example: c(x)=0

  • Training example: <x,c(x)>

  • Training sample collection: D

  • All possible hypotheses: H

  • Single hypothesis: h, is a Boolean function h: X→{0, 1}

Find-S: Find the extreme special hypothesis

Our hypothesis here is conjunctive

Brief description : Start with the most specific hypothesis in H, and generalize it when the hypothesis fails to cover the positive example.

​ The most specific hypothesis:<Ø,Ø,Ø,Ø,Ø,Ø>

​ Algorithm description (training process)

For each positive training instance x
For each attribute constraint ai ∈ h
If        the constraint ai ∈ h is satisfied by x
then    do nothing
else     replace ai ∈ h by the next more general constraint 
               that is satisfied by x
Output hypothesis 

Starting from the most specific hypothesis, Find-S guarantees that the output is the most specific hypothesis consistent with the positive example in H

Candidate elimination algorithm

  • The output of the candidate elimination algorithm is a set of all hypotheses consistent with the training example, and Find-S is only one of them.
  • Because of the partial ordering relationship, the candidate elimination algorithm does not need to explicitly enumerate all its members when describing the training set.
  • But the same as Find-S, the performance is poor when there are noisy data.

Variant space

  • General boundary G

  • Special boundary S

    初始化:G <- {<?,?,?,?,?,?>}
           S <- {<,,,,,>}
    遍历训练集 d = < x, c (x) >
    If d 是个正例
    	对G:移出G中与d不一致(即假设得到的概念与样本真实概念取值不符)的所有假设
    	对S:移出S中与d不一致的所有假设;如果一个假设h与d一致且G中有比h更一般的假设,那么将该假设加入S
    If d 是个负例
    	对S:移出S中与d不一致(即假设得到的概念与样本真实概念取值不符)的所有假设
    	对G:移出G中与d不一致的所有假设;如果一个假设h与d一致且S中有比h更具体的假设,那么将该假设加入
    

Some explanations and explanations

What if the training data contains errors

  • Will remove the correct target concept
  • Sufficient training data, the S and G boundaries converge to an empty variant space

Unbiased learner

To ensure that the target concept is in the hypothesis space, we need a hypothesis space that can express all teachable concepts. In other words, it can express all possible subsets of instance set X. And we call the set of all subsets of set X the power set of X (Power Set)

  • This is a hypothesis not only conjunctive, but also not to worry about not being able to express the target concept. However, the concept learning algorithm will not be generalized from the training examples at all! ! !
  • S becomes the disjunction of the positive example, and G becomes the negation of the disjunction of the negative example

Uselessness of unbiased learning

  • With the above introduction, it is not difficult to understand the uselessness of unbiased learning.

  • But it also illustrates a basic property of inductive reasoning : if the learner does not make a pre-presumption about the form of the target concept, it cannot classify unseen examples at all.

  • Since inductive learning requires a certain form of presupposition, also called bias induction , we can use bias induction to describe the characteristics of different learning methods.

Guess you like

Origin blog.csdn.net/qq_45175218/article/details/104032605