Machine Vision -- Basic Flow of Image Processing Tasks What is the learning loss function for a linear classifier?

foreword

I am a freshman majoring in the application of artificial intelligence technology. After reading some of the content of machine learning, I set myself a future development goal - machine vision, and this blog is my first contact with Some study notes in the field of machine vision will continue to be updated in this column in the future. Next, I will introduce the things I sorted out during the learning process for future reference, mainly the basic process of the machine vision system in the image processing process and the linear classifier that must be learned for beginners.

Introduction to Machine Vision

Machine vision is an artificial intelligence technology that uses computer vision and pattern recognition techniques to simulate the human visual system to analyze and process digital images and videos. Machine vision technology can help computer systems understand and perceive the surrounding environment, thereby realizing functions such as automatic control, intelligent identification, and automatic detection.

Machine vision technology can be applied in various fields, such as industrial automation, medical image analysis, security monitoring, intelligent transportation, drones, etc. Through machine vision technology, computers can automatically identify and analyze features such as objects, shapes, colors, and movements in images, thereby realizing automated control and intelligent decision-making.

The development of machine vision technology enables computers to more realistically simulate the human visual system, realize more intelligent image processing and analysis, and bring more convenience and innovation to humans.

Basic process in image processing

  • If you want to perform image processing tasks, the first thing you need to do is to input the image to the computer, so the first step must beinput image
  • After inputting the image, it is necessary to input theimage for representation, to represent the image in a form that the computer can understand.
    Generally, there are three modes: pixel representation, global feature representation (such as GIST), and global feature representation (such as SIFI feature + tape model).
  • After the image is represented, it is necessary toChoose a classification model, different classification models need to be adopted for different image processing tasks to achieve the best processing effect
  • After the image is classified by the classification model, the predicted value will be generated, and we need to compare these predicted values ​​with the real value, throughloss functioncalculated byloss value
  • After calculating the loss value, it is necessary to pass someoptimizationOptimize the parameters in the classification model so that the next loss value is smaller, so that the model is more accurate

This process is also calledtrain

Classifier

There are many classifiers in machine learning, including:

  • nearest neighbor classifier
  • Bayesian classifier
  • linear classifier
  • support support vector machine classifier
  • neural network classifier
  • Random Forest Classifier
  • Adaboost

In machine vision tasks, linear classifiers and neural network classifiers are mainly used, and there will be some content of linear classifiers below.

Some common loss functions

The loss function builds a bridge between model performance and model parameters, guiding the model for parameter optimization.
Commonly used loss functions are:

  • 0-1 loss
  • Multiclass Support Vector Machine Loss
  • cross entropy loss
  • L1 loss
  • L2 loss

The definition of the loss function will be carefully analyzed below

optimization

The optimization algorithm calculates the loss value in the model through a series of calculations by analyzing the loss value output by the loss function.renewparameter values, so as to achieve the whole modelperformance optimization

first order method

  • gradient descent
  • stochastic gradient descent
  • Mini-batch stochastic gradient descent

second order method

  • Newton's method
  • BFGS
  • L-BFGS

training process

  • The division of the data set (generally putTraining setandtest set8:2 classification)
  • Data preprocessing (in the visual field, image processing is generally performed, such as edge extraction, filter addition, etc.)
  • Data enhancement (generally reverse the picture, random cropping, random noise, etc.)
  • If you encounter underfitting and overfitting
    • Reduce algorithm complexity
    • Use weight regularization term
    • Use dropout regularization
  • hyperparameter tuning
  • model integration

Overfitting and Underfitting

Overfitting refers to the situation where the model performs well on the training data but poorly on the test data. The reason for overfitting isModel is too complexoverfittingThe noise and details in the training data are eliminated, resulting in the model's performance on new datapoor generalization ability

Underfitting refers to the situation where the model performs poorly on both the training data and the test data. The reason for underfitting isModel is too simple, cannot fit the complex relationships and features in the data well, resulting in the model not being able to learn and predict well.

There are different ways to deal with overfitting and underfitting. For overfitting problems, some methods can be used to reduce the complexity of the model, such as regularization, early stopping, data enhancement, etc.; for underfitting problems, some methods can be used to increase the complexity of the model, such as adding features, increasing The number of layers or width of the model, etc.

In machine learning, both overfitting and underfitting will affect the generalization ability and accuracy of the model. It is necessary to select and optimize the model according to the specific situation to obtain better performance and effect.

Evaluation Metrics for Image Classification Tasks

  • Correct rate = number of samples paired / total number of samples
  • Error rate = 1 - Correct rate

linear classifier

A linear classifier is a commonly used machine learning model for classifying data into different categories. The basic idea of ​​a linear classifier is to pass alinear functionClassify data, dividing data into two or more categories. Specifically, a linear classifier calculates a score based on the feature values ​​and weight parameters of the input data, and then compares the score with a threshold to determine the class to which the data belongs.

A linear classifier is alinear map, which maps the input image features to class scores.

The advantages of linear classifiers: the model is simple, easy to understand and implement, high computational efficiency, and suitable for classification problems with large-scale data and high-dimensional features. However, the disadvantage of linear classifiers is that they can only handle linearly separable data, and do not perform well for nonlinear classification problems. Therefore, in practical applications, it is necessary to select and optimize the model according to the specific situation to obtain better classification results.

Classification rules for linear classifiers

The classification rule of the linear classifier is based on a linear function to classify the data. Specifically, a linear classifier willeigenvectors of the input datawith aweight vectorconductinner product operation, and then add a bias term to get a fractional value. If the score value is greater than a preset threshold, the data is classified into one class, otherwise it is classified into another class.

Taking the binary classification as an example, suppose there is a linear classifier whose weight vector is w, the bias item is b, and the feature vector of the input data is x. The classification rules can be expressed as:

f(x) = sign(w·x + b)

Among them, sign represents the sign function, that is, the data whose score value is greater than 0 is divided into one category, and the data less than 0 is divided into another category. In practical applications, the weight vector and bias items of the model can be learned through training data, so as to obtain an optimal classifier.

It should be noted that linear classifiers can only handle linearly separable data, that is, two types of data can be separated by a hyperplane. For nonlinearly separable data, it is necessary to use some nonlinear classifiers, such as support vector machines, neural networks, etc., or use some feature transformation methods to map the data to a high-dimensional space for classification.

A vector of weights for a linear classifier

The w in the above formula f(x) = sign(w x + b) isweight vector

  • weight can be seen as aimage template
  • The higher the match between the input image and the evaluation template, the higher the score output by the classifier
    In other words, the higher the inner product value of x and w, the more accurate the prediction

loss function

The loss function is builtmodel performanceandModel parametersthe bridge between,guidance modelconductParameter optimization

  • The loss function is afunction, which is used to measure the difference between the predicted value and the true value of a given classifierInconsistency, the output of which is usually a nonnegative real value.
  • The non-negative real value of its output can be obtained asFeedback signalto the classifier parametersAdjustment,byreduceThe loss value corresponding to the current example to improve the classification effect of the classifier.

Different machine learning tasks and models use different loss functions. For example, in classification tasks, commonly used loss functions include cross-entropy loss function, logarithmic loss function, etc.; in regression tasks, commonly used loss functions include mean square error loss function, mean absolute error loss function, etc. The selection and definition of these loss functions need to be selected and optimized according to specific tasks and models.

Take the cross-entropy loss function as an example, which is a commonly used classification loss function to measure the difference between the model prediction and the actual label. The cross-entropy loss function is defined as follows:

L(y, f(x)) = - ∑ylogf(x) - ∑(1-y)log(1-f(x))

Among them, y represents the actual label, f(x) represents the probability value predicted by the model, and log represents the natural logarithm. The smaller the value of the cross-entropy loss function, the closer the predicted result of the model is to the actual label, and the better the performance of the model. During the training process, we usually optimize the model by minimizing the cross-entropy loss function through optimization algorithms such as gradient descent.

Guess you like

Origin blog.csdn.net/fuhao6363/article/details/130152606