01 machine learning algorithms overall knowledge and learning route Raiders

Entry machine learning more than a year, has done projects large and small machine learning, data competitions to participate in practical activities, from the initial muddle now gradually deepening, but also gradually have a lot of experience and understanding herein, this the main article to illustrate knowledge of machine learning, in order to let more people understand how machine learning should learn what to learn? The problem is mainly discussed in this article, the following beginning in detail.

First of Contents of this article to be introduced:

  1. 1. The term & Machine Learning
  2. 2. Common types of algorithms
  3. 3. Evaluation methods and indicators
  4. 4. performance optimization and tuning parameters over

1. The term & Machine Learning

Next formally introduced machine learning terminology. The term machine learning as a basic matter of common sense, in order to better learn and understand the machine learning algorithms, memorizing and understanding of the relevant terminology is necessary. The terms are common characteristics, labeling, generalization, supervised and unsupervised learning, over-fitting, less fit, robustness, classification, regression, clustering and dimension reduction, integrated learning.

Introduced here only machine learning, features, labels, supervised and unsupervised learning these terms, other terms will be introduced one by one in the back.

(1) machine learning
machine learning means based on historical data, using an algorithm to find the law or rules from the data, in order to achieve predict unknown thing, which is popular machine learning concepts

(2) features
feature refers to the thing itself close relationship attributes, in terms of popular feature is a set of independent variables data.

(3) the tag
label refers to a set value of the flag corresponding to the property characteristics, a set of labels are popular in terms of the strain data.

[1 Benefits: How fast and deep differences and to understand the meaning and characteristics of the label, the difference? ]
Benefits 1: for example the relationship: y = x1 + x2 + x3 , y is the label, and x1, x2, x3 is characterized, wherein the tag will appear as a two-dimensional vector in order to enhance the understanding:
X1 (wherein ) X2 (feature) X3 (characterized in) y (label)
sample 12 158
sample 256 314
... ... ... ... ...

(4) supervised and unsupervised learning
supervised learning algorithm refers to the training process, not only the characteristics of the data involved in the training, but also to participate in the training data labels to assist feature training learning.

Unsupervised learning refers to the algorithm in the training process, only data feature participate in training and learning, and the data label is uncertain or unknown.

[Welfare 2: how to quickly understand the difference between supervised and unsupervised learning it? ]
Benefits 2: = supervised learning feature tag +; = unsupervised learning feature.

2. Common types of algorithms

Common types of machine learning algorithms: classification, regression, clustering, dimension reduction.

(1) classification
classification algorithm refers to a class of problems for discrete data tags for training the algorithm to predict and classification process. Classification algorithm common are: K nearest neighbor, naive Bayes, logistic regression, support vector machines, decision trees (not listed here first classification algorithm integrated learning and the like).

(2) Regression
regression algorithm refers to an algorithm training for a class of problems for continuous data labels in order to predict and fitting process. Regression algorithm common are: linear regression, ridge regression, lasso return.

(3) clustering
clustering process of using a cluster similarity of the characteristics of things division. More common are clustering algorithm k-means.

(4) dimension reduction
dimensionality reduction refers to the high-dimensional data of the low-dimensional process, designed to replace the general characteristics of the information by the characteristics of the smaller amount of information. Common is the principal component analysis (PCA).

3. Assessment Method and Index

Performance assessment refers to training after learning algorithm, in order to verify the reliability of the algorithm, requires performance assessment methods and indicators to measure the strengths and weaknesses of the algorithm.

(1) Evaluation Method
Evaluation common: holdout validation, k-fold cross validation.

(2) evaluation index
classification evaluation index are: accuracy, confusion matrix, precision, recall, f1 fraction, and AUC value PR curve, ROC curve.

Regression assessment indicators are: the mean absolute error, standard deviation, root mean square errors.

4. performance optimization and tuning parameters over

Performance optimization is a further improvement to the algorithm model to obtain the prediction effect is more desirable. Performance Optimization common with gradient descent, ultra-parameter tuning, integrated learning algorithm, regularization term punishment, effective feature selection.

Benpian machine learning overall knowledge introductions, welcome message exchange to learn, grow and learn a little bit every day! ! !

Published 18 original articles · won praise 10 · views 20000 +

Guess you like

Origin blog.csdn.net/qq_41731978/article/details/104243884