Introduction to Machine Learning and Model Evaluation (1)

introduction

  1. Occam's Razor: If there are multiple assumptions consistent with observations, choose the simplest one
  2. No free lunch (NFL) : Different algorithms have the same total error outside the training set for all possible true objective functions.
  3. The core idea of ​​NFL: It is meaningless to talk about "what learning algorithm is best" in a specific way.
  4. Hypothesis space: Hypothesis space refers to the space composed of all the hypotheses of the problem. We can think of the learning process as the process of searching in the hypothesis space. The search goal is to find the hypothesis that "matches" with the training set.
    Note: It is assumed that the data set has n attributes, and the possible value of the i-th attribute is titi, plus the generalized value of the attribute (*), so the possible hypothesis is ∏i (ti + 1). The empty set is used to indicate that there is no positive example, and there are ∏i (ti + 1) +1 hypotheses in the space.
  5. Version space: In reality, we often face a large hypothesis space. We can find a hypothesis set consistent with the training set, which is called the version space.

Model evaluation and selection

  1. Training error: The error of the learner on the training set, also known as empirical error.
  2. Generalization error: the error of the learner on the new sample
  3. Complexity can be divided into two levels: one is O (1), O (log (n)), O (n ^ a), etc. We call it polynomial level complexity, and the other is O (a ^ n) and O (n!) type complexity, it is non-polynomial level, and its complexity is often unbearable by computers.
  4. P: A problem can be solved within the time complexity of a polynomial (O (n ^ k)) (a problem that is easier for a computer to calculate the answer to)
  5. NP: The solution of the problem can be verified in polynomial time (the computer can easily verify the answer to the problem after the answer is known.)
    Example: large integer factorization problem-for example, someone tells you that the number 9939550 can be broken down into two You do n’t know whether the product is right or not, but if you are told that the two numbers are 1123 and 8850, then it is easy to verify with the simplest calculator.
  6. NP-hard: Refers to the problem that all NP problems can be reduced within the polynomial time complexity, but the problem itself is not necessarily an NP problem
  7. NPC (NP complete problem): It is both an NP problem and an NP-hard problem. There is no way to solve the NP problem with polynomial time, so it is impossible to prove that P = NP.
  8. Model evaluation method:
    set aside method: directly divide the data set into two mutually exclusive sets, and take the average
    cross-check method for multiple divisions : divide the data set into k sized mutually exclusive subsets (k-fold cross-validation), each A subset to maintain the consistency of the data distribution as much as possible
    bootstrapping (bootstraping): put back from the sample set D (sample size m) m times sampling to get D ', after calculation, about 36.8% (1 / e) has not Appears in D '(training set).
    In order not to destroy the distribution of the initial data set, the retention method and the cross-validation method are more commonly used when the amount of data is sufficient.
  9. Error rate: The ratio of the number of misclassified samples to the total number of samples.
  10. Accuracy: The proportion of correctly classified samples to the total number of samples.
  11. ROC curve : To study generalization performance, the horizontal axis is the false positive rate (FPR), the vertical axis is the true rate (TPR), and AUC is the area under the ROC curve, which is used to measure the pros and cons of the learner.
  12. Deviation: Deviation measures the degree to which the learning algorithm ’s expected prediction deviates from the true result, instantly depicting the algorithm ’s own ability and ability.
  13. Variance: Variance measures the changes in learning performance caused by changes in the training set of the same size, and immediately depicts the impact of data disturbances
  14. Noise: expresses the lower bound of the expected generalization error that any learning algorithm can achieve on the current task, immediately depicts the difficulty of the learning problem itself
  15. The generalization performance is determined by the ability of the learning algorithm, the sufficiency of the data and the difficulty of the learning task itself.

Linear model

  1. Logistic regression: Also known as log odds regression, in fact, the prediction results of the linear regression model are used to approximate the true marked log odds.
  2. Linear Discriminant Analysis (LDA): Given a set of training examples, try to project the samples on a straight line so that the projection points of similar samples are as close as possible, and the projection points of heterogeneous samples are as far away as possible.
Published 16 original articles · Like1 · Visits 368

Guess you like

Origin blog.csdn.net/qq_41174940/article/details/105610469