Machine Learning 04: Introduction to Classification of Machine Learning Algorithms and Development Process

1. Algorithm classification

Before introducing the machine learning algorithm, let’s clarify two concepts: discrete data and continuous
data
insert image description here
. Counting data, such as population, class size, number of cars in a certain range...all of these data are integers , and cannot be subdivided , nor can their precision be further improved.

Continuity data:
insert image description here

The variable can take any number within a certain range, that is, the value of the variable can be continuous, such as length, time, quality value, etc. Such integers are usually non-integers and contain fractional parts.

After clarifying the above concepts, let's look at the classification of machine algorithms: they
can be roughly divided into two types: supervised learning and unsupervised learning.
The difference is that the data of supervised learning contains eigenvalues ​​​​and target values, and unsupervised learning only has eigenvalue


supervision . The learning algorithm includes
classification : k-nearest neighbor algorithm, Bayesian classification, decision tree and random forest, logistic regression, neural network
regression : linear regression, ridge regression...
Among them, the target value data corresponding to classification is discrete data, regression The corresponding data is continuous data. For example, in classification, we predict what kind of animal a picture is. In the feature processing mentioned above, we can process these animal categories as 1, 2, 3... and return the corresponding target value
data for continuous data


Unsupervised learning includes:
clustering k-means, etc.

The input data of supervised learning has features and labels, that is, there is a standard answer. The
input data of unsupervised learning has features but no labels, and there is no standard answer.

Classification is a core problem of supervised learning. In supervised learning, when the output variable takes a finite number of discrete values, the prediction problem becomes a classification problem. The most basic is the binary classification problem, that is, judging right from wrong, and selecting one of the two categories as the prediction result;
classification is to "segregate" data according to its characteristics, so it is widely used in many fields
. In banking, Build a customer classification model and classify customers according to the size of the loan risk.
In image processing, classification can be used to detect whether there are faces in the image, animal categories, etc.
In handwriting recognition, classification can be used to recognize handwritten numbers
. Text Classification, where the text could be news reports, web pages, emails, academic papers

Regression is another important problem in supervised learning. Regression is used to predict the relationship between input variables and output variables, where the output is a continuous value. Regression is also widely used in many fields to predict housing prices. According to the historical housing price data of a certain place, a forecast of financial information, daily stock trends, etc.

Let's look at a few examples and see if they are classification problems or regression problems
1. What is the temperature predicted tomorrow? (Regression)
2. Predict whether tomorrow will be cloudy, sunny or rainy? (Classification)

2. Development process

1. Obtain data and clarify what to use the data for
2. Basic data processing: pd processing data (missing values, merging tables...)
3. Feature engineering
4. Find a suitable algorithm for prediction/analysis

So what is the model? No need to delve into it, it can be understood as
model = algorithm + data
5. Model evaluation
to determine whether the effect of the model is good or bad

Guess you like

Origin blog.csdn.net/Edward_Legend/article/details/121289109