Machine Learning | Andrew Ng of Stanford finishing course notes (b) logistic regression

This series is Andrew Ng of Stanford CS229 finishing machine learning course notes, to note the following directories:

  (A) linear regression

  (B) logistic regression

  (C) neural network

  (Iv) analysis and optimization algorithm

  (E) support vector machine

  (六)K-Means

  (Vii) feature reduction

  (Viii) anomaly detection

  (Ix) recommendation system

  (J) the large-scale machine learning

Chapter logistic regression

Use linear regression to handle 0/1 classification is difficult, so to complete the introduction of logistic regression 0/1 classification, also represents a logic word (1) and not (0).

A, Sigmoid function prediction

In the logistic regression, the prediction function is defined as:
 
g (z) is called Sigmoid Function, also known Logic Function
                    

Second, the decision boundary

Decision boundary is predicted function hθ (x) of the property, rather than the training set property. This is because they can make a "draw" the boundaries between classes only hθ (x), but only to train the training set and adjust parameters.
  • Linear decision boundary
              
  • Nonlinear decision boundary
                  

Second, the forecast cost function

For the classification task, we are what we repeatedly adjust the parameter θ, that is, repeatedly turning the decision boundary to make more accurate predictions. Suppose we have a cost function J (θ), which is used to assess the predictive accuracy of a θ value, when you find the minimum cost function, you can make the most accurate predictions.
Typically, the less the cost function includes a minimum value, the easier its minimum value, the more easily the most accurate prediction. -> global minimum and a local minimum
                             
Logistic regression cost function is defined as:
           

Third, to minimize the cost function

SGD and also uses two ways BGD
     
      

Fourth, the regularization

Fitting solved the problem:
1) reduce the number of features
2) a smooth curve
Higher order coefficient of weakness (weakening tortuosity curve), referred to as a penalty parameter θ (penalize). --Regularization
 
• Linear regression Regularization:
   
Where the parameter λ is mainly do two things:
  - to ensure a good fit to the data
  - 保证 θ 足够小,避免过拟合问题。(λ 越大,要使 J(θ) 变小,惩罚力度就要变大,这样 θ 会被惩罚得越惨(越小),即要避免过拟合,我们显然应当增大 λ 的值。)
· 逻辑回归中的正则化

五、多分类问题

通常采用 One-vs-All,亦称 One-vs-the Rest 方法来实现多分类,其将多分类问题转化为了多次二分类问题。
假定完成 K 个分类,One-vs-All 的执行过程如下:
  - 轮流选中某一类型 i ,将其视为正样本,即 “1” 分类,剩下样本都看做是负样本,即 “0” 分类。
  - 训练逻辑回归模型得到参数 θ(1),θ(2),...,θ(K) ,即总共获得了 K−1 个决策边界。
                   
给定输入 x,为确定其分类,需要分别计算 h(k)θ(x),k=1,...,K ,h(k)θ(x) 越趋近于1,x 越接近是第k类:
                             
 
 

Guess you like

Origin www.cnblogs.com/geo-will/p/10306691.html