Based on maximum likelihood estimation (matlab experiment)

Experimental content

1. Understand the technical principles of classification and logistic regression, give a hypothetical function form that satisfies the classification problem, and derive a new cost function through maximum likelihood function estimation 2. For the binary classification problem, use matlab programming to obtain the classification result, and pass the
experiment Program, analyze, and deepen understanding of logistic regression classification problems

Experimental principle

1. In the classification problem, since y is a discrete value and y∈{0, 1}, it is assumed that the function satisfies 0≤h θ (x)≤1, so choose:
insert image description here

2. Further explain the output of h θ (x), h θ (x) is the possibility of y=1 or (y=0) obtained according to the input x. So suppose:
insert image description here

3. Assuming that m groups of training trials are independent of each other, the likelihood estimation function is obtained:
insert image description here

4. Maximum likelihood function:
insert image description here

5. Cost function:
insert image description here

6. Update all θ j at the same time by batch gradient descent : where α is the learning rate
insert image description here

data set

Link: https://pan.quark.cn/s/54f8b6d8f1df
Extraction code: 4UwS
The first and second columns are the results of the pre-enrolment examination, and the third column is whether you can enroll: 1 (Yes), 2 (No)
insert image description here

experiment procedure:

1. Divide the data set and display the training data set

>> data = load('E:/桌面/成绩单.txt');
>> X = data(:,1:2);
>> y = data(:,3);
>> [h,w] = size(train_data);
>>  for i=1:h
		if train_data(i,3)==1
			scatter(train_data(i,1),train_data(i,2),'g*');grid on;hold on;
		else
			scatter(train_data(i,1),train_data(i,2),'r.');grid on;hold on;
		end
	end

insert image description here

2. Use the gradient descent algorithm to find the smallest J and theta

%利用梯度下降的算法求解出最小的J和theta
>> alpha = 0.001;%学习率
>> [m,n] = size(X);
>> X = [ones(m,1) X];%特征矩阵
>> initial_theta = zeros(n+1,1);%初始化theta
>> prediction = X*initial_theta;%初始化预测
>> logistic = 1./(1+exp(-prediction));%逻辑函数
>> sqrError = (logistic-y)'*X;%均方误差
>> theta = initial_theta-alpha*(1/m)*sqrError';
>> couverg = (1/m)*sqrError';%J(theta)求导,用于判断是否达到最低点
>> J = -1*sum(y.*log(logistic)+(1-y).*log((1-logistic)))/m;%代价函数
>> a = 1;
>> Boolean = zeros(size(X,2),1);
%在最低点处退出循环,即导数为0
%while all(couverg(:)~=Boolean(:))
>> while a ~= 40000000
    prediction2 = X*theta;
    logistic1 = 1./(1+exp(-prediction2));
    sqrError2 = (logistic1-y)'*X;
    J = -1*sum(y.*log(logistic1)+(1-y).*log(1-logistic1))/m;
    theta = theta - alpha*(1/m)*sqrError2';
    couverg = (1/m)*sqrError2';
    a = a+1;
    end

Solution J:
insert image description here

Solve for theta:

insert image description here

3. Prediction result:

%预测某个学生的成绩为[45,90],求被录取的概率
>> pre1 = logsig([1 45 90]*theta)*100;
>> pre1

pre1 =

    99.93
    
%预测某个学生的成绩为[45,45],求被录取的概率    
>> pre2 = logsig([1 45 45]*theta)*100;
>> pre2

pre2 =

 	2.73

4. Experimental summary:
The maximum likelihood estimation is used in the derivation of the cost function of logical classification, but the maximum likelihood estimation seeks the maximum value, while the cost function obtains the minimum value, so there is only a negative sign, here The cost function of is actually a cross entropy. In addition to the gradient descent algorithm, BFGC (transformation method) and L-BFGS (limited variable scale method) can also be used. The advantage of these algorithms is that they can automatically select a good learning rate, which is usually much faster than the descent algorithm, but it is also relatively complex.

Guess you like

Origin blog.csdn.net/weixin_56260304/article/details/127615451