Supervised learning algorithm - linear, logistic regression

Supervised learning algorithm

Linear Regression

Linear regression equation refers to the performance of a straight line in space, whose decision boundary is linear regression is given a set of points on the math, it can be used to fit a curve, the curve is a straight line if this, then It is called linear regression

Y is a continuous variable regression numeric variables is classified in category Y-type

Also known as a linear regression of simple linear regression, the model refers to only one independent variable and a dependent variable, a mathematical expression model can be expressed as: y = ax + b + ξ, similar to a linear function, wherein [xi] is error model, a and b are collectively referred to as the regression coefficient. The presence of the error term ξ mainly to balance both sides of the equal sign value, a variance (variance) commonly referred to as partial [epsilon] x of the model can not explain the same for all the independent variables; [epsilon] values ​​are independent; [epsilon] satisfy n state distribution

Linear regression "trying to learn a linear model to predict the real value of the output labeled as accurately as possible.

Linear model attempts by a linear combination of learned to predict the property function, i.e.,

Linear regression model assumptions actually is a linear correlation between the target variables and the characteristic variables based on the study of the characteristic variables d linear combination in order to predict the value of the function of the target variable

Linear Regression Model Parameters solving process, essentially seeking a perfect straight line through all of the training set of data points, so that all the data points to the line of minimum distance from here usually refers to the Euclidean distance

For the linear regression model, the parameter estimation method used in two ways: 1. The least-squares method 2. gradient descent method   

Linear regression model loss function is a convex function , which is the minimum value Min


Gradient descent

As the name suggests, the gradient descent method is to find the minimum value in the direction of the gradient descent function. For linear regression problem, the desired method of gradually approaching the minimum loss function parameters and several iterations of parameter optimal solution gradient descent, gradient descent method derivative after each step will be completed, the corresponding position is calculated, and then along the gradient (fastest changing direction) opposite way forward

Action : minimizing a loss function

Gradient descent two ways
  • Batch gradient descent (BGD)

Batch gradient descent algorithm each iteration all sample data will use the training set

Batch gradient descent can ensure that each iteration of the gradient in the right direction, but long iteration time, serious computing resource consumption

  • Stochastic gradient descent method (SGD)

Each stochastic gradient descent method iterations of the training set using only a random sample data

Stochastic gradient descent method, while multiple iterations, each iteration but faster. Because each iteration can not guarantee the correct direction of the gradient, the loss function value oscillations occur.

A super important parameter is the gradient descent step size (learning rate), is used to control the size of the blindfolded man pace, is the decline. If the step size is small enough, then the cost will reduce the function of each iteration, until the gradient descent algorithm to find the optimal parameters so far. However, the step-down process, the calculation of time will continue to increase. If the step is too long, this person may be repeated across the bottom, which is the gradient descent method may swing around the optimal value uncertain

Sometimes stochastic gradient descent method compared to batch gradient descent are less likely to fall into local optimum .
  • Small batch gradient descent (MBGD)

Small batch gradient descent algorithm combines the advantages of BGD and SGD, each iteration of a number of sample data from the training set.

  1. First parameter assignment, this value may be random, can also make a vector of all zeros.

  2. Changes in value, making it the loss of function in accordance with the direction of the gradient descent were reduced, in fact, this direction is to reduce the quickest direction

  3. 2 repeated) step, so that the error is within a given range.

It should be noted: When we estimate the parameters of the loss function (e.g., least squares linear regression estimation inside), as is the minimum loss function evaluation parameter (the minimum value is approximate to a declining), so the use of gradient descent when we estimate the parameters with maximum likelihood function, the parameters might be required corresponding to the maximum likelihood function (rising approximates a maximum value), then the rise time when the gradient method. Actually, it is all the same, just in front of the sign of the problem


Logistic regression

Logistic regression algorithm is currently the most widely used as a learning algorithm for solving classification problems. As with linear regression algorithm, also supervised learning algorithm. The reason is not linear regression, classification, y values ​​of 0 or 1, if using linear regression, the output value of the linear regression model may be much greater than one, or much less than 0, it will lead to significant cost function

To solve this problem, we give a Sigmoid function activation function:

When z = 0, the value is 0.5 when z is increasing, the value of 1 when z approaches will continue to decrease, the value close to 0

Logistic regression algorithm is the use of the function depends Sgimoid calculated from the regression coefficient for the sample space critical accurate data classification, the classification of the values ​​of the samples approaches the target value 0 or 1, the closer the better target , the more accurate prediction

Thinking logistic regression algorithm is: its output value is always between 0-1.
The purpose of logistic regression is: the binary data to improve accuracy

Loss of function is its logistic regression maximum likelihood function

As a logistic regression regression (that is, the value of y is continuous), how to apply to classify it up. y value is indeed a continuous variable. Logistic regression approach is a designated threshold value, y is greater than this threshold value is a class, y is less than this threshold value is a different category. How to adjust the threshold value specific to the actual situation. Usually selected as a threshold value to divide 0.5

Logistic Regression and Linear Regression principle is similar, it can be simply described as a process:

(1) to find a suitable prediction function, generally expressed as a function of h, we need to find the function is a function of the classification, which is used to predict the result of determination of the input data. Need to have some knowledge of the data or analysis, know or guess "probably" in the form of a prediction function, such as a linear function or nonlinear function. (2) a constructor function Cost (loss function), the function represents the deviation between the predicted output (h) with the training data category (Y), it may be a difference (HY) or other forms therebetween. Considering all of the training data, "loss", the sum or averaging Cost, referred to as J (θ) function, indicates deviation of all training data prediction value and actual category.

(3) Clearly, the value of J (θ) is a function of the more accurate the smaller the prediction function (i.e., more accurate function h), so this step needs to do is to find the minimum value of J (θ) function. Find the minimum of a function of different methods, some gradient descent method (Gradient Descent) when Logistic Regression implemented.



to sum up:

Linear regression for real regression, logistic regression was used rather than the classification, the basic idea is to use gradient descent method of least squares in the form of the error function to be optimized. The basic form is the univariate linear regression y = ax + b, for fitting relational data such as price and housing area.

  • Advantages: simple, simple calculation;

  • Disadvantages: not fit non-linear data;

Logistic is used for classification, it is a non-linear binary model, mainly in computing probability sample characteristics under certain events, but it is essentially a linear regression model, because removing the sigmoid mapping function, other steps the algorithm is linear regression. It can be said, logistic regression, linear regression are based on the theoretical support. For example, according to the user's browser to purchase as a feature to calculate whether it will buy this product, or whether it will click on this product. The final value is then LR is based on a linear function and then be determined by a sigmod function, and a linear function of the cumulative weight of eigenvalues ​​and adding an offset seeking out, so when the training is training LR and linear weight value of each weight function w.

  • Advantages: simple; computing the amount is very small when classifying, fast, low storage resource

  • Disadvantages: easy to overfitting, generally not too high accuracy; only two classification process (on the basis of the derived softmax multiple classification may be used), and must be linearly separable;

The difference between classification and regression: that type of output variables.

Quantitative output is called regression, or that is a continuous variable to predict; qualitative output is called classification, or discrete variables to predict.

For example: forecasting tomorrow's temperature is how much of this is a return to the task; Tomorrow is forecast to be overcast, sunny or rain, it is a classification task.

Overall essentially two issues are the same, it is the model fit (match).

Guess you like