Basic concepts of artificial intelligence 1: model, fitting, maximum likelihood estimation, likelihood function, linear regression, sigmoid function, logistic regression

1. Model, fitting and overfitting

The artificial intelligence model (Artificial Intelligence Model) refers to some algorithms and mathematical models, which are used to process and analyze large amounts of data and information, and continuously optimize their performance and prediction accuracy through training and learning. Common models include neural networks, Naive Bayes, decision trees, and more.

Fitting refers to matching or fitting data or samples with a certain model or function, so that the model or function can predict or describe the trends and laws in the data or samples to the greatest extent.

Overfitting is when a model performs well on the training set but poorly on the test set. This may be because the model is too complex, or the training set is too small or not representative enough. Some common solutions include increasing the training set data, using regularization techniques, etc.

2. Maximum Likelihood Estimation (MLE) and Likelihood function

Maximum Likelihood Estimation (Maximum Likelihood Estimation, MLE) is a commonly used parameter estimation method, often used in statistics and machine learning. The basic idea is that given a set of observed data, we assume that these data obey a certain probability distribution, and the unknown distribution parameters can be estimated by maximizing the likelihood function.

Likelihood function refers to the function of unknown parameters under the condition of known observation data. The idea of ​​maximum likelihood estimation is that among all possible parameter values, the parameter value that can make the observed data appear with the highest probability is selected as the estimated value. The parameter values ​​obtained in this way are the maximum likelihood estimates.

The steps of maximum likelihood estimation are usually :
determine the probability distribution function of the model, and write the likelihood function.
Solve the derivative of the likelihood function and set the derivative equal to 0 to obtain the analytical solution of the parameters.
If an analytical solution cannot be found, a numerical optimization algorithm (such as gradient descent method) is used to solve it.

The advantage of maximum likelihood estimation is that it has high efficiency and asymptotic normality when the sample is sufficient. The disadvantage is that when the sample size is small, overfitting may occur .

Why can the analytical solution of the parameters be obtained by solving the derivative of the likelihood function

In maximum likelihood estimation, we need to find the maximum value of the likelihood function. For some probability distributions, the maximum value of the likelihood function can be obtained by taking the derivative to be zero. This is because the point where the derivative is 0 is the extreme point of the function, including the maximum and minimum values.

Suppose we want to estimate the parameters of a distribution that maximizes the probability of a given observation. We can write the likelihood function and then differentiate it. If the likelihood function is differentiable, we can set the derivative to 0 and then solve for the parameters analytically. This analytical solution is the parameter value that makes the likelihood function take the maximum value.

It should be noted that not all likelihood functions of probability distributions can be analytically solved by taking the derivative to be 0. For some complex probability distributions, we may need to use other methods such as numerical optimization to solve them.

3. Linear regression

Linear Regression is an algorithm widely used in machine learning and statistics. It is a model that establishes a linear relationship between an input variable (independent variable) and an output variable (dependent variable).

The linear regression model can be used to predict continuous data. In practical applications, it is often used to predict future trends, analyze the relationship between data, and so on.

The basic idea of ​​linear regression is to make predictions by establishing a linear relationship between the independent variable and the dependent variable. A linear regression model can be expressed as y = wx + b, where y is the dependent variable, x is the independent variable, w is the weight (or called the slope), and b is the intercept. When training the model, we need to find the optimal weights and intercepts that minimize the error between the predicted value and the true value.

The linear regression algorithm usually uses the least squares method to solve, that is, to minimize the sum of squared errors between the predicted value and the true value. In practical applications, we can use optimization algorithms such as gradient descent to find the best weight and intercept.

The advantage of linear regression is that it is simple, easy to explain and implement, and is applicable to many practical problems. However, it suffers from poor performance for nonlinear problems, requires feature engineering of the data, and is susceptible to outliers and noise.

Four, sigmoid function

The sigmoid function is a common activation function, which can map the output value to a range between 0 and 1, which is beneficial for binary or multi-classification of the output results.

In addition, the sigmoid function also has the advantages of continuous derivation, monotonically increasing, and easy calculation. These characteristics make the sigmoid function widely used in neural networks.

The sigmoid function is usually used in the output layer of the neural network, which can convert the output into a probability value, and is suitable for binary classification problems.

The formula of the sigmoid function is: f ( x ) = 1 / ( 1 + e − x ) f(x) = 1 / (1 + e^{-x})f(x)1/(1+ex)

The numpy library, tensorflow library, pytorch library, etc. in Python all support the calculation of the sigmoid function. You can use any of them to calculate the sigmoid function.

5. Logistic regression

Logistic Regression (Logistic Regression) is a machine learning algorithm widely used in classification problems. It is mainly used to divide samples in a data set into two or more categories. It predicts the likelihood of an event by modeling the characteristics of the data.

The output of the logistic regression model is a probability value, which represents the probability that the sample belongs to a certain category. Usually, if the probability value is greater than a set threshold, the sample is classified into that class, otherwise it is classified into another class.

The core idea of ​​logistic regression is to map the output of the linear regression model through a logistic function (also known as the sigmoid function ) and convert it into a probability value.
The training process of the logistic regression model usually adopts the maximum likelihood estimation method, and the model parameters are determined by maximizing the likelihood function.

The advantage of the logistic regression model is that it is simple, easy to implement and interpret, and is suitable for binary and multi-classification problems. However, it suffers from poor performance for nonlinear problems, requires feature engineering of the data, and is susceptible to outliers and noise.

6. Summary

This article introduces several basic concepts related to artificial intelligence: model, fitting, maximum likelihood estimation, likelihood function, linear regression, sigmoid function, and logistic regression.

For more basic knowledge of artificial intelligence, please refer to the column " Basic Knowledge of Artificial Intelligence ".

Blogging is not easy, please support:

If you have gained something from reading this article, please like, comment, and bookmark, thank you for your support!

Paid Columns About Old Ape

  1. The paid column " https://blog.csdn.net/laoyuanpython/category_9607725.html Using PyQt to Develop Graphical Interface Python Applications" specifically introduces the basic tutorial of Python-based PyQt graphical interface development, and the corresponding article directory is " https://blog.csdn .net/LaoYuanPython/article/details/107580932 Use PyQt to develop a graphical interface Python application column directory ";
  2. The paid column " https://blog.csdn.net/laoyuanpython/category_10232926.html moviepy audio and video development column ) introduces in detail the class-related methods of moviepy audio and video clip synthesis processing and the use of related methods to process related clip synthesis scenes, corresponding articles The directory is " https://blog.csdn.net/LaoYuanPython/article/details/107574583 moviepy audio and video development column article directory ";
  3. The paid column " https://blog.csdn.net/laoyuanpython/category_10581071.html OpenCV-Python Beginners Difficult Questions Collection " is " https://blog.csdn.net/laoyuanpython/category_9979286.html OpenCV-Python Graphics and Image Processing "The accompanying column is the integration of the author's personal perception of some problems encountered in the learning of OpenCV-Python graphics and image processing. To understand OpenCV, the corresponding article directory is " https://blog.csdn.net/LaoYuanPython/article/details/109713407 OpenCV-Python Beginners Difficult Problem Collection Column Directory "
  4. The paid column " https://blog.csdn.net/laoyuanpython/category_10762553.html Getting Started with Python Crawlers" introduces what you should know about crawler development from the perspective of an Internet front-end development novice, including the basics of getting started with crawlers, and how to crawl. Get CSDN article information, blogger information, give articles likes, comments and other actual combat content.

The first two columns are suitable for novice readers who have a certain Python foundation but no relevant knowledge. The third column please combine " https://blog.csdn.net/laoyuanpython/category_9979286.html OpenCV-Python Graphics and Image Processing " learning to use.

For colleagues who lack the foundation of Python, you can learn Python from scratch through Lao Yuan's free column " https://blog.csdn.net/laoyuanpython/category_9831699.html Column: Python Basic Tutorial Catalog ).

If you are interested and willing to support Laoyuan readers, you are welcome to purchase paid columns.

Old Ape Python, learn Python from Old Ape!

☞ ░ Go to Laoyuan Python blog post directory https://blog.csdn.net/LaoYuanPython

Guess you like

Origin blog.csdn.net/LaoYuanPython/article/details/129622968