Manual implementation of logistic regression (Logistic regression) based on weka

1. Logistic regression model

The logistic regression model is actually a classification model. The implementation here is based on the two textbooks of "Statistical Machine Learning" by Li Hang and "Machine Learning" by Zhou Zhihua.

Suppose our input is xxx x x x can be multi-dimensional, we want according toxxx to predictyyy y ∈ { 0 , 1 } y\in \{0,1\} y{ 0,1 } . The logistic model is as follows:

p ( Y = 1 ∣ x ) = e x p ( w ⋅ x ) 1 + e x p ( w ⋅ x ) (1) p(Y=1|x)=\frac{exp(w\cdot x)}{1+exp(w\cdot x)}\tag{1} p ( Y=1∣x)=1+exp(wx)exp(wx)(1)

where the parameter www is what we want to learn, note: it contains the weight coefficient and bias (bias) b. This representation is more concise when writing programs.

Second, the maximum likelihood method parameter estimation

parameter www is what we need to learn, and we use the maximum likelihood method to estimate the model parameters.

set up:

P ( Y = 1 ∣ x ) = π ( x ) , P ( Y = 0 ∣ x ) = 1 − π ( x ) (2) P(Y=1|x)=\pi(x),\quad P (Y=0|x)=1-\pi(x)\tag{2}P ( Y)=1∣x)=π ( x ) ,P ( Y)=0∣x)=1π ( x )(2)

The likelihood function is:

∏ i = 1 N [ π ( x i ) ] y i [ 1 − π ( x i ) ] 1 − y i (3) \prod_{i=1}^N[\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i} \tag{3} i=1N[ p ( xi)]yi[1π ( xi)]1yi(3)

Because this exponential form is not conducive to derivation, we need to convert them to logarithmic form, as follows:

L ( w ) = ∑ i = 1 N [ y i l o g π ( x i ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i l o g ( π ( x i ) 1 − π ( x i ) ) + l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i ( w ⋅ x i ) − l o g ( 1 + e x p ( w ⋅ x i ) ) ] (4) \begin{aligned} L(w)=&\sum_{i=1}^N[y_ilog\pi(x_i)+(1-y_i)log(1-\pi(x_i))] \\ =&\sum_{i=1}^N [y_ilog(\frac{\pi(x_i)}{1-\pi(x_i)})+log(1-\pi(x_i))]\\ =&\sum_{i=1}^{N}[y_i(w\cdot x_i)-log(1+exp(w\cdot x_i))] \end{aligned} \tag{4} L(w)===i=1N[yilogπ(xi)+(1yi)log(1π ( xi))]i=1N[yilog(1π ( xi)π ( xi))+log(1π ( xi))]i=1N[yi(wxi)log(1+exp(wxi))](4)

For L ( w ) L(w)Find the maximum value of L ( w ) and get wwEstimated value of w .

Third, the gradient descent method to solve the likelihood function

The gradient descent method is to find the minimum value, and what we want to get is L ( w ) L(w)The maximum value of L ( w ) , therefore, we takeL ( w ) L(w)The opposite number of L ( w ) , namely:

arg min ⁡ w − L ( w ) (5) \argmin_{w}-L(w) \tag{5}wargminL(w)(5)

For L ( w ) L(w)L ( w ) regardingwwThe derivative of w is as follows:

( − L ( w ) ) ′ = − ∑ i = 1 N [ ( y i ⋅ x i ) − e x p ( w ⋅ x i ) 1 + e x p ( w ⋅ x ) ⋅ x i ] = − ∑ i = 1 N [ ( y i − e x p ( w ⋅ x i ) 1 + e x p ( w ⋅ x ) ) ⋅ x i ] = ∑ i = 1 N [ ( e x p ( w ⋅ x i ) 1 + e x p ( w ⋅ x ) − y i ) ⋅ x i ] (6) \begin{aligned} (-L(w))'=&-\sum_{i=1}^N[(y_i\cdot x_i)-\frac{exp(w\cdot x_i)}{1+exp(w\cdot x)}\cdot x_i]\\ =&-\sum_{i=1}^N[(y_i-\frac{exp(w\cdot x_i)}{1+exp(w\cdot x)})\cdot x_i]\\ =&\sum_{i=1}^N[(\frac{exp(w\cdot x_i)}{1+exp(w\cdot x)}-y_i)\cdot x_i] \end{aligned} \tag{6} (L(w))===i=1N[(yixi)1+exp(wx)exp(wxi)xi]i=1N[(yi1+exp(wx)exp(wxi))xi]i=1N[(1+exp(wx)exp(wxi)yi)xi](6)

Then we get the parameter wwThe update formula of w is as follows:

w ′ = w − l r ⋅ ( − L ( w ) ′ ) = w − l r ⋅ ( ∑ i = 1 N [ ( e x p ( w ⋅ x i ) 1 + e x p ( w ⋅ x ) − y i ) ⋅ x i ] ) (7) \begin{aligned} w'=&w-lr\cdot(-L(w)')\\ =&w-lr\cdot(\sum_{i=1}^N[(\frac{exp(w\cdot x_i)}{1+exp(w\cdot x)}-y_i)\cdot x_i]) \end{aligned} \tag{7} w==wlr(L(w))wlr(i=1N[(1+exp(wx)exp(wxi)yi)xi])(7)

Regarding the selection of the optimization method, the Newton method provided in the watermelon book was firstly implemented. The advantage of the Newton method is that it can obtain a faster convergence speed, but the disadvantage is that when the Hessian matrix is ​​a singular matrix, it will appear. Cases that cannot be resolved.

Therefore, the quasi-Newton method can be used for optimization, and while solving this problem, it can also converge quickly.

However, I am not familiar with the quasi-Newton method, and although the gradient descent method may converge slowly, it is relatively simple to implement, so the gradient descent method is used here to optimize the likelihood function.

Fourth, weka-based code implementation

package weka.classifiers.myf;

import weka.classifiers.Classifier;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.matrix.Matrix;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NominalToBinary;
import weka.filters.unsupervised.attribute.Standardize;

import java.util.Arrays;

/**
 * @author YFMan
 * @Description 自定义的 Logistic 回归分类器
 * @Date 2023/6/13 11:02
 */
public class myLogistic extends Classifier {
    
    
    // 用于存储 线性回归 系数 的数组
    private double[] m_Coefficients;

    // 类别索引
    private int m_ClassIndex;

    // 牛顿法的迭代次数
    private int m_MaxIterations = 1000;

    // 属性数量
    private int m_numAttributes;

    // 系数数量
    private int m_numCoefficients;

    // 梯度下降步长
    private double m_lr = 1e-4;

    // 标准化数据的过滤器
    public static final int FILTER_STANDARDIZE = 1;

    // 用于标准化数据的过滤器
    protected Filter m_StandardizeFilter = null;

    // 用于将 normal 转为 binary 的过滤器
    protected Filter m_NormalToBinaryFilter = null;


    /*
     * @Author YFMan
     * @Description 采用牛顿法来训练 logistic 回归模型
     * @Date 2023/5/9 22:08
     * @Param [data] 训练数据
     * @return void
     **/
    public void buildClassifier(Instances data) throws Exception {
    
    
        // 设置类别索引
        m_ClassIndex = data.classIndex();

        // 设置属性数量
        m_numAttributes = data.numAttributes();

        // 系数数量 = 输入属性数量 + 1(截距参数b)
        m_numCoefficients = m_numAttributes;

        // 初始化 系数数组
        m_Coefficients = new double[m_numCoefficients];
        Arrays.fill(m_Coefficients, 0);

        // 将输入数据进行标准化
        m_StandardizeFilter = new Standardize();
        m_StandardizeFilter.setInputFormat(data);
        data = Filter.useFilter(data, m_StandardizeFilter);

        // 将类别属性转为二值属性
        m_NormalToBinaryFilter = new NominalToBinary();
        m_NormalToBinaryFilter.setInputFormat(data);
        data = Filter.useFilter(data, m_NormalToBinaryFilter);

        // 梯度下降法
        for(int curPerformIteration = 0; curPerformIteration < m_MaxIterations;curPerformIteration++){
    
    

            double[] deltaM_Coefficients = new double[m_numCoefficients];
            // 计算 l(w) 的一阶导数
            for(int i = 0;i<data.numInstances();i++){
    
    

                double yi = data.instance(i).value(m_ClassIndex);
                double wxi = 0;
                int column = 0;
                for(int j=0;j<m_numAttributes;j++){
    
    
                    if(j!=m_ClassIndex){
    
    
                        wxi += m_Coefficients[column] * data.instance(i).value(j);
                        column++;
                    }
                }
                // 加上截距参数 b
                wxi += m_Coefficients[column];
                double pi1 = Math.exp(wxi) / (1 + Math.exp(wxi));
                for(int k=0;k<m_numCoefficients - 1;k++){
    
    
                    deltaM_Coefficients[k] += m_lr * (pi1 - yi) * data.instance(i).value(k);
                }
                // 这里计算 bias b 对应的更新量
                deltaM_Coefficients[m_numCoefficients - 1] += m_lr * (pi1 - yi);
            }

            // 进行参数更新
            for(int k=0;k<m_numCoefficients;k++){
    
    
                m_Coefficients[k] -= deltaM_Coefficients[k];
            }

            // 如果参数更新量小于阈值,则停止迭代
            double delta = 0;
            for(int k=0;k<m_numCoefficients;k++){
    
    
                delta += deltaM_Coefficients[k] * deltaM_Coefficients[k];
            }
            if(delta < 1e-6){
    
    
                break;
            }

        }
    }


    /*
     * @Author YFMan
     * @Description // 分类实例
     * @Date 2023/6/16 11:17
     * @Param [instance]
     * @return double[]
     **/
    public double[] distributionForInstance(Instance instance) throws Exception {
    
    

        // 将输入数据进行标准化
        m_StandardizeFilter.input(instance);
        instance = m_StandardizeFilter.output();

        // 将输入属性二值化
        m_NormalToBinaryFilter.input(instance);
        instance = m_NormalToBinaryFilter.output();

        double[] result = new double[2];
        result[0] = 0;
        result[1] = 0;
        int column = 0;
        for(int i=0;i<m_numAttributes;i++){
    
    
            if(m_ClassIndex != i){
    
    
                result[0] += instance.value(i) * m_Coefficients[column];
                column++;
            }
        }
        result[0] += m_Coefficients[column];
        result[0] = 1 / (1 + Math.exp(result[0]));

        result[1] = 1 - result[0];

        return result;
    }

    /*
     * @Author YFMan
     * @Description 主函数 生成一个线性回归函数预测器
     * @Date 2023/5/9 22:35
     * @Param [argv]
     * @return void
     **/
    public static void main(String[] argv) {
    
    
        runClassifier(new myLogistic(), argv);
    }
}

Guess you like

Origin blog.csdn.net/myf_666/article/details/131253958