20201217 Deep Learning Study Notes (1) Basics of Machine Learning

This series of notes includes but is not limited to the content of the "Deep Learning" flower book.

Overview:

  • Hyperparameters: Most learning algorithms have hyperparameters (must be set in addition to the learning algorithm and with additional data (we call it the validation set)).
  • The essence of machine learning: it belongs to applied statistics, involving two main methods of statistics: frequency estimation and Bayesian inference.
  • Classification of machine learning algorithms: supervised learning and reinforcement learning. Among them, supervised learning can be divided into supervised, unsupervised and semi-supervised learning according to whether the data has labels. Supervised learning can be divided into classification and regression according to the category of the label.
  • The solution of deep learning algorithms: most of them are through stochastic gradient descent.

1. Learning algorithm

  • Definition of machine learning algorithm: Mitchell (1977), for a certain type of task T and performance metric P, a computer program is considered to be able to learn from experience E means that after improvement through experience E, it is based on the performance metric P on task T The measured performance has improved.
  • Task T: How should the machine learning system process samples. How to understand and learn? The learning process itself cannot be regarded as a task. Learning is to acquire the ability to complete a task. How to understand the task? For example, our goal is to enable the robot to walk, and walking is the task. How to understand the sample? A sample is a collection of quantified features that we collect from certain objects or events that we want the machine learning system to process.
  • Common machine learning tasks (not strictly classified, just introduce what tasks can be done):
    • Classification: Learn a single classification function and specify which of the K classes certain inputs belong to.
    • Input missing classification: When the input may be missing, learn a set of functions instead of one to specify which of the K categories the missing input corresponds to, which is often used for medical diagnosis.
    • Regression: For a given value, it is a predicted value. Except for the different forms of returned results, regression and classification problems are very similar.
    • Transcription: Observe some relatively unstructured data and transcribe the information into discrete text. For example, OCR recognition, speech recognition.
    • Machine translation: The input is a sequence of symbols in one language, and the output is a sequence of symbols in another language.
    • Structured output: The output is a vector or other data structure containing multiple values, and there are important relationships between the different elements that make up the output. This category is very large, including the above-mentioned transcription and machine translation, pixel-level segmentation of images, adding descriptions to images, etc.
    • Anomaly detection: screening a group of events or objects, and marking abnormal or atypical individuals. For example, credit card fraud detection, industrial machine anomaly detection, etc.
    • Synthesis and sampling: The output is some new samples similar to the training data. It is mostly used in media applications, such as video games to automatically generate textures of large objects or landscapes, instead of letting the artist manually mark each pixel; another example is speech synthesis, which is a type of structured output task but each input corresponds to multiple correct outputs .
    • Missing value filling: Given a new sample, the algorithm must fill in these missing values.
    • Denoising: The input is the damaged sample after the clean sample has gone through the so-called damage process. Given a new damaged sample, output a clean sample.
    • Density estimation or probability mass function estimation: The learning function is the probability density function of the sample sampling space (if the sample is continuous) or the probability mass function (if the sample is discrete). The algorithm needs to learn the structure of the observed data, that is, what is the situation The next sample clustering occurs, and under what circumstances is unlikely to occur.
  • Performance measurement P
  • Experience E

2. Capacity, over-fitting and under-fitting

 

3. Super parameter and verification set:

  • Training set, validation set, and test set: The training set is used to learn parameters; the validation set is used to estimate the generalization error during or after training, and to update and select hyperparameters; the test set is used to estimate the learner's parameters after the learning process is completed Generalization error. The test set does not participate in the selection of the model in any form, including the design of hyperparameters.
  • Hyperparameter function: control the behavior of the algorithm. The value of the hyperparameter is not learned by the learning algorithm itself, or the hyperparameter is not learned by the training set.

4. Cross validation

  • Cross-validation function: When the data set is small and the test set is not large enough, it means that the average test error estimate is statistically inaccurate, and it is difficult to judge the pros and cons of the algorithm. At this time, the method of cross-validation can be used to calculate a reasonable average test error, at the cost of increasing the amount of calculation.
  • The process of cross-validation: Based on the original data, different training sets, validation sets, and test sets are randomly adopted or separated, and the average error of multiple tests is calculated. The most common is K-fold cross-validation.
  • The problem of cross-validation: there is no unbiased estimate of the mean error variance, and it is usually solved by approximation.

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/weixin_38192254/article/details/111317003