coursera Wu Enda machine learning machine learning homework / exercise induction + script test (ex12345678)

Write down the problems you encountered while doing your homework.
coursera Wu Enda machine learning website

PS: There is no language written in the code snippet here. You can check it. I’m too lazy. If I find that I don’t write it, the magenta is displayed here (see Wu Enda and found that the red is called magenta hhh), which is still very good-looking: I use Google to
insert image description here
browse Reader, and then downloaded a "dark reader" plug-in, which is really perfect to protect the dog's eyes, and I was really going blind when I was facing a white screen.
Added later: I found that it was magenta only when I was editing, and it was white when I was reading. .

Click to send:
Ex1
Ex2
Ex3
Ex4
Ex5
Ex6
Ex7
Ex8


Ex1

Bits and pieces of notes:

  1. nx1, or 1xn is called vector, vector, denoted as R n R^nRn , nxm is called a matrix, denoted as Rnxm
  2. I only saw that password when I clicked on the programming assignment. After watching the video, I wanted to try it, but I couldn't find the password. . .

1. warmUpExercise

A = eye(5);

The video also gave the answer, a good start, we got our first "Nice Work!" The
place to pay attention to is to add a semicolon, if you don't need to test, otherwise a bunch of things will be output... Then the output does not seem to affect the result, and the returned Just do it right.


2. Computing Cost (for One Variable)

prediction = X*theta;
cost = (prediction - y).^2;
J = 1/(2*m)*sum(cost);

Note that adding a "." to .^2 is an element-by-element operation


3. Gradient Descent (for One Variable)
insert image description here
can be written according to the formula.

delta = X'*(X*theta - y); %X'可以不加括号
theta = theta - alpha/m*delta;

Pay attention to the size relationship of the matrix, such as delta = X' * (X * theta - y); cannot be written as (X*theta - y)*X'; because X' is 20x2, one is 20 x 1, and delta (That is, the derivative of theta, the same size as theta) is 2x1, so X' should be written in front. This problem is more difficult to find when pushing the formula by hand. As for the length and width of those parameters, it should be printed with a black frame to see.

This problem was mentioned in the reading materials in the submitting programming assignments in the second week. At that time, I didn't know what was going on. . . After I finished writing, I accidentally clicked on it and found out that I was talking about this issue.


4. Feature Normalization
feature normalization processing, where normalize is (x - mean) / standard deviation. . .
Write a (x - average) / (max-min), has been wa. . . Post it, otherwise it will be written in vain:

mu = mean(X_norm')
mu = mu' * ones(1,3);
sigma = std(X_norm');
X_norm = (X_norm-mu)./(max(X_norm')'*ones(1,3)-min(X_norm')'*ones(1,3))

Mainly: here x is an example for each row, and each column is a feature, so it needs to be transposed when calling the mean, std, and max functions. . . Because they should be calculated column by column, you can't see it in a single-row vector, but you can see it in a matrix.


5. Computing Cost (for Multiple Variables)
is the same as Computing Cost (for One Variable), just paste it.


6. Gradient Descent (for Multiple Variables)
is the same as Gradient Descent (for One Variable), just paste it. The charm of matrix operations.


7. Normal Equations
insert image description here
theta = (pinv(X' * X))*(X') * y
wrote -1 when inverting the matrix. . . Can't remember what X-1 stands for, neither inverse nor transpose ==

Finally, let’s commemorate the completion of ex1~
insert image description here


Ex2

1. Sigmoid Function

g = ones(size(z,1),size(z,2))./(1+exp(-z));

It is not easy to pay attention to point division ( ./ ), and then note that 1 cannot be written in the numerator, because this 1 is 1x1 dimensional, and then the denominator is OK, because 1 is added to the exp(-z) of the correct dimension later, and the dimension will be merged into the large one.


2. Logistic Regression Cost
Haha, I finally had to look at the submit file. Both 2 and 3 call costFunction, but one check J and the other checks grad. Then I found that the J in 3 was wrong. In fact, 3 is not right, because the logistic function is not a convex function when written according to linear, so the form needs to be changed: it is
insert image description here
easy to handle, just look for it, and the size of the matrix multiplication is also handled in the figure:

prediction = ones(size(X*theta,1),size(X*theta,2))./(1+exp(-X*theta));
cost = (prediction - y).^2;
%J = 1/(2*m)*sum(cost); Wrong!!!
J = 1/m*(-y'*log(prediction)-(1-y)'*log(1-prediction));
grad = X'*(prediction - y)/m;

3. Logistic Regression Gradient - costFunction
wants to do the second one, but the result is the third one (???)

prediction = ones(size(X*theta,1),size(X*theta,2))./(1+exp(-X*theta));
cost = (prediction - y).^2
J = 1/(2*m)*sum(cost)
grad = X'*(prediction - y)/m

Because the logistic and linear cost (J) and cost gradient calculation formulas are the same, just change the prediction (z). Just copy and paste the previous code.


4. Predict
suddenly found that the functions in a folder (bar) can call each other. .

%X(1,:)
%theta
for i=1:m
  if sigmoid(X(i,:)*theta)>=0.5
    p(i) = 1;
  endif
endfor

The thing to pay attention to is to look at the length and width of X and theta. To match, i is to facilitate the X of each row (each group), and then multiply it by theta. p(i) is the i-th row (group) results. emmm, the for loop is used here, but I can't think of how to update the p(i) of each row without using for.
I thought of it just after I finished writing it (hh, I’m so smart), and I remembered that there is a = [X>0] in Python... It’s almost like this (the grammar seems to be wrong, it’s used in ReLu). Check the element to see if it is > 0, and then assign it to a (also a matrix). Then I imitated it in the black frame here:
insert image description here
this wave of gourd painting is very OK! So continue to imitate:

predict = sigmoid(X*theta)
p = predict>=0.5

It's over. Do not use for, directly operate on the matrix.


5. &6. Regularized Logistic Regression Cost & Regularized Logistic Regression Gradient
are all tested in one file and written together.

h = sigmoid(X*theta)
J = 1/m*(-y'*log(h)-(1-y)'*log(1-h))+lambda/2/m*(sum(theta.^2)-theta(1)*theta(1)) 
grad = X'*(h-y)/m + lambda/m*theta
grad(1) = grad(1) - lambda/m*theta(1)

J’s reference:
insert image description here
The above mentioned how to calculate the matrix:
insert image description here
Σ is omitted because y is mx1, yT is 1xm, and h is mx1. Just print them with size() to know their length and width.
Initially handed in:

J = 1/m*(-y'*log(h)-(1-y)'*log(1-h))+lambda/2/m*sum(theta.^2)

A prompt will pop up, saying that it is necessary to pay attention to θ(1) without operation, and then just change it. Note that the θ subscript is 1, not 0, and an error is reported again.
Then there is the reference for grad: Website: Regularized Linear Regression
insert image description here
Look back at the code written yesterday:
insert image description here
Take a look at the gourd:

grad = X'*(h-y)/m + lambda/m*theta
grad(1) = grad(1) - lambda/m*theta(1)

Note that the subscript is 1, and then grad(1) does not need to be regularized, just add it and subtract it.
I can't think of a more concise method, but at least J and grad don't need for, and use matrix multiplication flexibly.

Finished, the last screenshot to commemorate:
insert image description here


Ex3

1. Regularized Logistic Regression

h = sigmoid(X*theta);
J = 1/m*(-y'*log(h)-(1-y)'*log(1-h))+lambda/2/m*(sum(theta.^2)-theta(1)*theta(1)) ;
grad = X'*(h-y)/m + lambda/m*theta;
grad(1) = grad(1) - lambda/m*theta(1);

Paste ex2 and it's over... ...


2. One-vs-All Classifier Training
got stuck for 40 minutes. . The topic is relatively long, and fmincg won't use it (the one that should come is still coming):

initial_theta = zeros(n + 1, 1);
options = optimset('GradObj','on','MaxIter',50);
for c = 1:num_labels
  %initial_theta = zeros(n + 1, 1); 没有加上但是还是过了
  k = fmincg(@(t)(lrCostFunction(t,X,(y==c),lambda)),initial_theta,options)
  k';
  all_theta(c,:) = k';
  all_theta;
endfor

insert image description here
The iteration is how many times fmincg runs, and the cost is the loss function. At first, I thought it was an error. . .

The idea of ​​the topic is to run num_labels times, the xth time fmincg gives an optimal theta, and put it in the xth row of all_theta, because the given theta is a column vector, so it needs to be transposed in a row of all_theta.

To sum up, the xth row of all_theta is used to store the xth best theta. This theta is obtained for the cases where y is 1, 2, 3, and 4 (because num_labels = 4), where x is 1 ~4.

For the convenience of transposition, I use k to store a column vector theta, then transpose k, and then assign it to a row of all_theta, that is, the above code, if a parameter k is omitted:

initial_theta = zeros(n + 1, 1);
options = optimset('GradObj','on','MaxIter',50);
for c = 1:num_labels
  %initial_theta = zeros(n + 1, 1); 没有加上但是还是过了
  all_theta(c,:) = fmincg(@(t)(lrCostFunction(t,X,(y==c),lambda)),initial_theta,options)'; %最后加上转置" ' "
endfor

Since the problem of transposition was not considered, what was written at the beginning was:

for c = 1:num_labels
  all_theta(c) = fmincg(@(t)(lrCostFunction(t,X,(y==c),lambda)),all_theta(c),options)
endfor

It can be said that it is full of mistakes. First (c) is not the cth row, but the cth one after the matrix is ​​arranged in columns, and then all_theta should be initial_theta. In addition, if you are careful, you can find that initial_theta is not cleared every time it is traversed, but it doesn't matter, because this fmincg algorithm is powerful enough, and theta is all set to 0, isn't it the best?

But wait, is initial_theta really needed? Isn't all_theta also 0 for each row at the beginning? So we can save another parameter initial_theta! All you have to do is take the row for all_theta and transpose it a bit to replace initial_theta:

options = optimset('GradObj','on','MaxIter',50);
for c = 1:num_labels
  all_theta(c,:) = fmincg(@(t)(lrCostFunction(t,X,(y==c),lambda)),all_theta(c,:)',options)';
endfor

Great, now the code is much leaner than the correct code submitted in the first place, and it is also correct ヾ(≧∇≦*)ヾ



3. The old method of One-vs-All Classifier Prediction, use size() to output the length and width of all_theta and X, knowing that X is mx 3 dimensions, all_theta is num_labels x 3 dimensions (4 x 3), first the max function has 4 types:
insert image description here
max(X) Find the largest element in X
max( X , [] , 1 ) Find the largest element in each column of X
max( X , [] , 2 ) Find the largest element in each row of X
[ a , b ] = max (X), a is the largest element in each column of X, b is the subscript of the largest element in each column of X
max( X , Y ) is the largest element in each position of the X and Y matrices

What we are mainly looking for is after theta is multiplied by X, for each group (row), find out which of the 4 kinds of y has the highest probability, get the one, and store it in the row corresponding to p. Then use the max function to get the subscript of the largest element in each column, so we imagine that, because there are 4 types, we will make 4 rows, and then the columns will be the corresponding probabilities of these 4 types. When calling the max function, You can get the subscript of the one with the highest probability in each column. So this matrix is ​​of size 4 xm, so all_theta * X'. After making it together, we get the row vector composed of subscripts with the highest probability, and transpose it to p:

t = all_theta*X';
[a,b] = max(t);
p = b';

By the way, the sigmoid activation is not added, because θx is inherently increasing, and so is sigmoid, the bigger the bigger, so there is no need to add


4. The Neural Network Prediction Function
was a bit messy when I wrote it. I don’t know what the final output matrix is. I glanced at other people’s code and found that the hidden layer was missing and 1 was added. The input layer remembers and the hidden layer forgets:

X = [ones(m, 1) X];
z2 = sigmoid(X*Theta1');
size(z2);
a2 = [ones(m,1) z2];
a3 = a2*Theta2';
t = a3';
[a,b] = max(t);
p = b';

input layer matrix is ​​mx 2 -> mx 3
hidden layer input matrix is ​​mx 3 * 3 x 4 = mx 4
hidden layer output matrix is ​​mx 4 -> mx 5
output layer matrix is ​​mx 5 * 5 x 4 = mx 4
final, mx 4 (or 4 xm) means that there are m sets of inputs, and then correspond to 4 sets of outputs (probability). In addition, after
reading what others wrote, it turns out that [a,b] = max(A,[ ], 2) Such coquettish operations have similar meanings, but one is for row vectors and the other is for column vectors:

X = [ones(m, 1) X]
z2 = sigmoid(X*Theta1');
size(z2);
a2 = [ones(m,1) z2];
a3 = sigmoid(a2*Theta2');
[a,b] = max(a3,[],2);
p = b;

Sure enough, I still have to look at other people's codes and learn from each other's strengths.

The sigmod can be added or not at the end, because it is all incremental, but it has the meaning of "probability", so it is still added.

Let’s stumble, it’s not too difficult to find out after finishing it (the truth of all things: it’s not difficult to understand), and the last routine screenshot commemorates:
insert image description here


Ex4

Feedforward and Cost Function & Regularized Cost Function
has not been typed for a long time because of the review, it is unfamiliar, and it has been stuck for a long time:
reference formula:
insert image description here

[hx,dummy] = predict(Theta1,Theta2,X);
Y = zeros(m,num_labels);
for i=1:m
  Y(i,y(i))=1;
endfor
t1 = log(hx).*Y + log(1-hx).*(1-Y);
sum_xita = sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2));
J = -1/m*sum(sum(t1)) + lambda/2/m*sum_xita;

% 不能够直接矩阵乘法,因为矩阵乘法的话,log(hx_k)和跟下一组y_k+1等乘在一起,是不对的
% 正确的理解应该是点乘,自己乘回自己,这里没有理解,卡了很久
% 然后theta点乘的时候,第一位是要省略不乘的,可以用语法简易截取

predict is changed in the file:
insert image description here


Sigmoid Gradient
according to the formula:

g = sigmoid(z).*(1-sigmoid(z));

Take a look at the proof of the derivation of the Sigmoid function . . a bit difficult


Neural Network Gradient (Backpropagation) & Regularized Gradient
wrote collapsed, refer to others
and continue to add after nnCostFunction:

X = [ones(m,1) X];
for i=1:m
  a1 = X(i,:);  %1x3
  z2 = a1*Theta1'; %1x4
  a2 = [1 sigmoid(z2)];
  z3 = a2*Theta2';
  a3 = sigmoid(z3);
  deta3 = a3-Y(i,:); % 1 x 4
  deta3 = deta3'
  deta2 = Theta2(:,2:end)'*deta3.*sigmoidGradient(z2')
  size(z2)
  size(deta2)
  size(Theta2(:,2:end)')
  size(deta3)
  size(a2')
  Theta2_grad = Theta2_grad + deta3*a2; % 4 x 5
  Theta1_grad = Theta1_grad + deta2*a1; % 4 x 3
endfor
Theta1(:,1) = 0;
Theta2(:,1) = 0;
Theta1_grad = (Theta1_grad + lambda*Theta1)/m;
Theta2_grad = (Theta2_grad + lambda*Theta2)/m;

Knowledge reference:
insert image description here
insert image description here
insert image description here
[Description of the detailed process]
Let me tell you, what pitfalls I stepped on in the afternoon:

  1. I don't know what a and z are respectively. I have been working on it for a long time, and I didn't read the previous notes. I just found out that the pdf has
  2. Then the input layer of the first layer needs to add x0 = 1, hehe
  3. Lowercase delta (δ), that is θ T ∗ δ θ^T*δiTδ , I actually thought it didn’t matter, so I wroteδ T ∗ θ δ^T*θdTθ , wrong to Feitian
  4. In the final pdf, D should be both △ and λθ divided by m

From ten o'clock in the morning to five o'clock in the afternoon, I stepped on all kinds of pitfalls, let me commemorate it:
insert image description here
ex4 is stumbling, so I have time to look at the forum questions.


Ex5

Regularized Linear Regression Cost Function & Regularized Linear Regression Gradient
reference picture:
insert image description here
Brainstormed and wrote the comment part. Later, I read other people's and found that there is no need to multiply x to the power of i. After a while, I found that the superscript of x refers to the first Several x, not powers of x.
insert image description here
Then it is to calculate the gradient, grad
because the grad here needs to use regression and regularization.
Look back at the Ex2 question notes: (this watermark is a bit difficult to remove) Regularized Linear Regression
insert image description here
and the part framed in blue is grad, which is the gradient of J to θ.
Then there is the complete code:

h = zeros(m,1);
h = X*theta;
J = 1/2/m*(sum((h-y).^2))+lambda/2/m*sum(theta(2:end).^2)
grad = X'*(h-y)/m + lambda/m*theta
grad(1) = grad(1) - lambda/m*theta(1)

Remember that the first θ does not require regularization.


The requirement of Learning Curve
is to draw the error of cross validation and the error of training test as the number of samples i increases. And draw these two pictures, the usual situation is: (here N is i)
insert image description here
and then found a very transparent problem: I can't use Octave to draw pictures . If it is not a test, it will draw a picture, but if it is submitted, it will iterate many times, and then there are problems with the drawn picture, and the drawn picture is very messy.
It's a little annoying, the tutorials on the Internet are all matlab, the progress is a bit rushed, and I don't want to learn matlab, so let's take a look after the summer vacation after the exam.
Code: (I didn’t understand it at the beginning, but after seeing the pictures drawn by others, I knew what to do (as N increases, there are two error pictures))

for i = 1:m
  theta = trainLinearReg(X(1:i,:),y(1:i,:),lambda);
  error_val(i) = linearRegCostFunction(Xval,yval,theta,0);
  %theta = trainLinearReg(X,y,lambda)
  error_train(i) = linearRegCostFunction(X(1:i,:),y(1:i,:),theta,0);
endfor

Polynomial Feature Mapping
calculates X and becomes [X(i) X(i).^2 X(i).^3 ... X(i).^p], code:

for i=1:size(X,1)
  for j=1:p
  X_poly(i,j) = X(i).^j; 
endfor

When reading other people's articles, I found that the properties of the matrix can be used more fully:

X 
for i=1:p
  X_poly(:,i) = X.^i; 
endfor
X_poly

insert image description here
Suddenly, I found that the printout was very different from what I understood = =, well, I didn’t seriously examine the question, and I just set two for loops when I looked at the formula, but I still did it right.
Post a blog to see matlab pictures: https://blog.csdn.net/weixin_40807247/article/details/81359042
Final practice:
insert image description here
PS: I think this time I didn’t do it very well = =, I was a little impatient to read the questions, and failed , After thinking about it for a while, I went to look at other people's codes. It's not bad, but I have to read the questions well.


Ex6

Gaussian Kernel
calculates Gaussian kernel formula:
insert image description here
||u|| calculation method: (u is 2 dimensions)
insert image description here
replace u with x1 - x2, correct code:

t = x1-x2;
sim = exp(-sum(t.^2)/(2*sigma^2))

error code:

sim = exp(-(x1-x2).^2/(2*sigma^2))

Obviously did not understand the formula.


Parameters (C, sigma) for Dataset 3
was dumbfounded, and then I remembered that my friend said there was a document to download:
insert image description here
Life suck. There are scripts in it, and a pdf introducing the topic, (maybe there are other data or something) , Some of the stuff we were asked to download at the beginning will not be there.

However, Su You said that it was useless to introduce the background, so I have not read the "here" file for a long time, so it is safe and sound.

I really couldn't stand the homework this time. After reading the downloaded file, I found that there were actually a lot of hints, including the usage of functions that were not mentioned, which were not covered in the class.

The meaning of the question is to find the best C and sigma (variance / σ), and then use svmPredict. In fact, when you use svmPredict, you will find that you need to use svmTrain, and then the fourth parameter in it needs to use a @啥哥, and svmTrain’s You also need to look back at the code. Looking at the code, you can find that x1 and x2 do not need to be defined, because the direct assignment is (1,0), which is also in line with what was taught in the class. Then the pdf also prompts the parameters recommended by C and sigma, and the double-cycle brute force search is enough:

vec_C = [0.01;0.03;0.1;0.3;1;3;10;30];
vec_sigma = [0.01;0.03;0.1;0.3;1;3;10;30];
%x1 = [1 2 1 5 9 8]; x2 = [0 4 -1 7 6 5]; 见有人定义了x1 x2
%x1 x2这里怎么定义没所谓,因为svmTrain中kernelFunction的用法是(1,0)
vec_errors = 10000000;
for i=1:length(vec_C)
  for j=1:length(vec_sigma)
    model= svmTrain(X, y, vec_C(i), @(x1, x2) gaussianKernel(x1, x2, vec_sigma(j)));
    pred = svmPredict(model,Xval);
    error = mean(double(pred~=yval));    
    if (error < vec_errors)
      a = i;
      b = j;
      vec_errors = error;
    endif
  endfor
endfor
C = vec_C(a); % 注意要记得赋值
sigma = vec_sigma(b);
vec_errors;
model = svmTrain(X,y,vec_C(i),@(x1, x2) gaussianKernel(x1, x2, vec_sigma(j)));
visualizeBoundaryLinear(X,y,model); % 看ex6.m知道的,可以用来看图

Email Preprocessing
violence comparison on the line:

    for i=1:size(vocabList)
      if (strcmp(str,vocabList(i,1)))
        word_indices = [word_indices;i];
        break;
      endif
    endfor

emailFeatures
is the same as violence:

for i = 1:size(word_indices)
  x(word_indices(i)) = 1;
endfor

I feel that it is relatively simple this time, maybe you can read it from the pdf hh, and you don’t need to write it yourself:
insert image description here


Ex7

Find Closest Centroids (k-Means)
Note that a row of X minus a row of centroids (because centroids (μ) is a point in X (bar)), so the dimensions are the same.
Instead of X(i) - centroids(j)

m = size(X,1);
for i=1:m
  min = 1000000000;
  for j=1:K
    t = X(i,:) - centroids(j,:);
    s = sum(t.^2);
    if (s<min)
      min = s;
      idx(i) = j;
    endif
  endfor
endfor

Compute Centroid Means (k-Means)
can be calculated by matrix. I know there is a find but I am not very familiar with it. Later, I used for. I read other people's and made the following modifications:

%自己写的:
##for i=1:K
##  cnt = 0;
##  for j=1:m
##    if (idx(j)==i)
##      centroids(i,:) += X(j,:);    
##      cnt++;
##    endif
##  endfor
##  centroids(i,:) /= cnt;
##endfor

%别人用了find的,进行更改:
for i=1:K
  t = find(idx==i);
  centroids(i,:) = sum(X(t,:))/length(t);
endfor

K-means is used for image compression
So far, run the script ex7:
insert image description here
it feels good and interesting.

I saw something about image compression in the pdf, which is to set K colors (such as 16 colors), and then run K-means.
First, write kMeansInitCentroids, which contains:

% randidx存的是大小为X的行数的,然后把用X(randidx(1:K),:)分出打乱后的前K行
randidx = randperm(size(X,1)); % Randomly reorder the indices of examples
centroids = X(randidx(1:K),:); % Take the first K examples as centroids

The usage test of randperm is as follows.
insert image description here
Use a variable to save a vector, and then use another matrix to associate rows into this vector. This method/process is a bit convoluted when written in C, but it works perfectly here (I feel that Python can also be written this way). Although it is short, it compiles The device also understands your thoughts, which is very convenient.

Then run this script in the command window:
insert image description here
When K = 16:
insert image description here
change the script, K = 8:
insert image description here
K = 1:
insert image description here
K = 1024:
insert image description here


PCA

insert image description here

Start with a picture, the code is all about editing:

sigma = X'*X/m;
[U,S,X] = svd(sigma);

Project Data (PCA)

U = U(:,1:K);
Z = X*U; %懒得纸上算了,长宽瞎搞

Recover Data (PCA)

X_rec = Z*U(:,1:K)'; % 看回projectData

One more programming exercise is over! ! !
insert image description here


Ex8

Estimate Gaussian Parameters

insert image description here

mu = mean(X)';
t = mu'.*ones(m,n);
sigma2 = sum( (X-t).^2 )'/m;

t is used to expand the matrix, because I don’t want to use for
insert image description here
to read other people’s code:

X
mu = mean(X)
size(mu)
size(X)
sigma2=sum((X - mu) .^ 2)/m;

emmm is right to run out. . .
Discovery - same as multiplication, follow the horizontal multiplication vertical, I am still not familiar with the code = =
insert image description here



Just type Select Threshold , note that <epsilon is 1:

    pval;
    yval;
    fp = sum((pval<epsilon)&(yval==0));
    tp = sum((pval<epsilon)&(yval==1));
    fn = sum((pval>=epsilon)&(yval==1));

    prec = tp/(tp+fp);
    rec = tp/(tp+fn);

    F1 = 2*prec*rec/(prec+rec);

insert image description here
insert image description here


Collaborative Filtering Cost & Collaborative Filtering Gradient
has no regularized cost and gradient:

% without regularization 
% version 1.0
J = sum(sum((X*Theta'-Y).^2.*R))/2;
for j=1:size(R,2)
  for i=1:size(R,1)
    if (R(i,j)==1)
      [i,j];
      X_grad(i,:) += (X(i,:)*Theta(j,:)'-Y(i,j))*Theta(j,:);
      Theta_grad(j,:) += (X(i,:)*Theta(j,:)'-Y(i,j))*X(i,:);
    endif
  endfor
endfor

J's reference:
insert image description hereinsert image description here
insert image description here
Because Y is a matrix of nm * nu size, X and Theta need to change positions, here is X ∗ T heta TX*Theta^TXThetaT
then uses R to determine whether the position of a certain row i and column j is valid, 1 is valid, 0 is invalid, so it is a dot product.
Finally, because J is a number, two sums are used to sum a vector.

Gradient's reference:
insert image description here
I glanced at the prompt in the supporting pdf, said to use a for loop, and then couldn't write it out...
Finally, I still think carefully about the meaning of the matrix: (Of course, X and X_grad have the same meaning (because it is a gradient relationship), Theta The same is true)
X matrix: each row represents a movie (num_movies) (x1 x2 in the figure below), each column represents a feature (num_features) (degree of romance and action) Theta matrix: each row
represents a user (num_users) (Alice Bob Carol Dave), each column represents a feature (num_features) (for romance and action two types of love)
to see more clearly with the diagram: (Note: The Theta matrix of the diagram is a column for a user, each row Indicates a feature, the first line 0 is for multiplying with X0 that does not appear, which can be ignored)
insert image description here
and then look back at our definition:
insert image description here
According to the combing of the matrix just now and the above formula, X_grad is actually updated for each row, and then It is the multiplication of (the transposition of row j of Theta) and (row i of X), note: what comes out here is a number, not a matrix. Then multiply it with the j-th row of Theta, remember to add up the answers, because for the i-th row (i-th movie), there are different scores (columns/features), here are two loops, each time only for the i-th row The data in column j, so the results must be accumulated.
Then Theta_grad is the same, where R ( i , j ) means "if the i-th movie was rated by the j-th user ", so for Theta_grad, the update is row j, and X_grad is i OK.

Suddenly found that there are hints on the next page of the pdf. . . Talk about how to find j:r(i,j)=1:
insert image description here
follow the gourd and draw a new code:

% without regularization 
% version 2.0
J = sum(sum((X*Theta'-Y).^2.*R))/2;
for i=1:size(R,1)
  idx = find(R(i,:)==1); % 即找出符合的 j; 
  X_grad(i,:) += (X(i,:)*Theta(idx,:)'-Y(i,idx))*Theta(idx,:);
endfor

for j=1:size(R,2)
  idx = find(R(:,j)==1); % 即找出符合的 i
  Theta_grad(j,:) += (X(idx,:)*Theta(j,:)'-Y(idx,j))'*X(idx,:);
endfor

Note that in the final Theta_grad calculation formula, the result of - Y should be transposed, otherwise an error will be reported:
insert image description here
after writing, run the ex8_cofi, and get a picture:
insert image description here
I don’t know what it is...
Regularized Cost & Regularized Gradient
Add regularization here, Then just make a little modification on the basis of the previous one.
refer to:
insert image description here
insert image description here

% with regularization

J = sum(sum((X*Theta'-Y).^2.*R))/2 + lambda/2*sum(sum(Theta.^2))+ lambda/2*sum(sum(X.^2));
for i=1:size(R,1)
  idx = find(R(i,:)==1); % 即找出符合的 j; 
  X_grad(i,:) += (X(i,:)*Theta(idx,:)'-Y(i,idx))*Theta(idx,:)+lambda*X(i,:);
endfor

for j=1:size(R,2)
  idx = find(R(:,j)==1); % 即找出符合的 i
  Theta_grad(j,:) += (X(idx,:)*Theta(j,:)'-Y(idx,j))'*X(idx,:)+lambda*Theta(j,:);
endfor

But the first time it will time out, I don't know why. . . Just submit it again:
insert image description here
run ex8_cofi at the end, it feels weird, the scores are all 5.0, but all the questions are passed again: the
insert image description here
last programming homework, it's over! ! !
insert image description here

Guess you like

Origin blog.csdn.net/Only_Wolfy/article/details/89893734