First, warm-up exercises
Requirements: generating a matrix of 5 5 *
warmUpExercies function:
A = function warmUpExercise ()% A function is defined as warmupexercise
A = Eye (5 );% Eye () identity matrix, i.e., the function of the function generating unit matrix of 5 5 *
end
transfer:
fprintf('Running warmUpExercise ... \n'); fprintf('5x5 Identity Matrix: \n'); warmUpExercise() %调用该函数 fprintf('Program paused. Press enter to continue.\n'); pause;
Output:
Second, the drawing (Plotting)
According to the address given in the data (the first column of the population of the city, as the profit corresponding to the second city, a negative number indicates a loss) to draw scatter plots to better selection of restaurants: Requirements
plotData function:
plotData function (X, y) Plot (X, y, ' RX ' , ' MarkerSize ' , 10 ); ylabel ( ' Profit in $ 10,000 S ' );% sets the y-axis labels the xlabel ( ' Population in City of 10,000 S ' ) ;% set x-axis labels end
* MarkerSize: marker size, the default is 6
* Rx: red cross
transfer:
fprintf ( ' the Plotting the Data ... \ n- ' ) data = Load ( ' ex1data1.txt ' );% loading data file X- = data (:, . 1 ); Y = data (:, 2 );% First Column assigned to all the elements of X, the second column is assigned to all of the elements Y m = length (Y);% defined and m is the number of training samples plotData (X, y);% the X, Y function as calling plotData fprintf ( ' Program . Press Enter to Continue paused \ n-. ' ); PAUSE;
Output:
Third, the cost function and gradient descent
Linear regression goal is to minimize the cost function J (θ):
Is used here to represent the error square method, the smaller the error representative of the better fit.
Assuming that h (x) is given by a linear model:
Parameters of the model is [theta] J by adjusting the value of [theta] to minimize the cost function, is a method of gradient descent, in this algorithm, [theta] will be updated every iteration
With each step of the gradient descent, the parameter [theta] J near the lowest cost J (θ) is the optimum value of [theta]
* And =
different, :=
it represents while updating (simultaneously update), is simply the first calculation of θ into a temporary variable, and finally all θ are calculated over an assignment back together again, for example,
temp1 = theta1 - (theta1 - 10 * theta2 * x1) ; temp2 = theta2 - (theta1 - 10 * theta2 * x2) ; theta1 = temp1 ; theta2 = temp2 ;
1 an increase in the data, because the cost function [theta] 0 coefficient is 1 , and the parameter is initialized to 0, the learning rate α is initialized to 0.01
Matrix representation of words similar to:
= X-[ones (m, 1 ), Data (:, 1 )];% was added to a 1 x first column Theta = zeros ( 2 , 1 );% initialization fit parameter Iterations = 1500 ;% iterations Alpha = 0.01 ; learning rate is set to 0.01%
== "calculates a cost function J (convergence detection)
function J = computeCost(X, y, theta)
m = length(y);
J = 0;
J = sum((X*theta-y).^2)/(2*m);
end
Call cost function:
fprintf('\nTesting the cost function ...\n') % compute and display initial cost J = computeCost(X, y, theta); % 调用代价函数J fprintf('With theta = [0 ; 0]\nCost computed = %f\n', J); fprintf('Expected cost value (approx) 32.07\n'); % further testing of the cost function J = computeCost(X, y, [-1 ; 2]); % 调用代价函数J
fprintf('\nWith theta = [-1 ; 2]\nCost computed = %f\n', J);
fprintf('Expected cost value (approx) 54.24\n');
fprintf('Program paused. Press enter to continue.\n');
pause;
operation result:
It can be seen that, when taken theta [0; 0] than when taking the cost function [1; 2] small cost function, described [0; 0] better.
gradientDescent.m-- run gradient descent:
Description: A good way to gradient descent is working verification is to check the value of J and check it with each step reduction. gradientDescent.m code calls computeCost at each iteration and print the value of J. Suppose implemented correctly and gradient descent computeCost, the J value should not be increased, and should converge to a stable value at the end of the algorithm.
In gradient descent, each iteration is executed following this update:
Gradient descent function:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha % Initialize some useful values m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters % ====================== YOUR CODE HERE ====================== % Instructions: Perform a single gradient step on the parameter vector % theta. % % Hint: While debugging, it can be useful to print out the values % of the cost function (computeCost) and gradient here. % theta = theta-alpha*(1/m)*X'*(X*theta-y); % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCost(X, y, theta); end end
Gradient descent function call:
fprintf ( ' \ nRunning Gradient Descent ... \ n- ' ) % RUN gradient descent of running a gradient descent function theta = gradientDescent (X-, Y, theta, Alpha, Iterations); % Print Screen theta to the theta value output fprintf ( ' Theta found by gradient descent of: \ n- ' );% gradient descent function calculated Theta fprintf ( ' % F \ n- ' , Theta); fprintf ( ' the expected Theta values (approx) \ n- ' );% the desired Theta fprintf ( ' -3.6303 \ n-1.1664 \ n-\ n- ' );
operation result:
The resulting parameters are drawn with MATLAB and predict profits 35,000 and 70,000 population:
% Plot Linear Fit The linear fit straight line drawn HOLD ON; % Keep current Previous Plot visible pattern holding means, i.e., the sample points on the image remains Plot (X-(:, 2 ), X-Theta *, ' - ' ) Legend ( ' Training Data ' , ' Linear Regression ' )% create a legend label HOLD OFF % Don ' T overlay the any More Plots ON the this Figure % Predict values for Population sizes of 35 , 000 and 70 , 000 predict1 = [ . 1 , 3.5 of ] * Theta ; fprintf('For population = 35,000, we predict a profit of %f\n',... predict1*10000); predict2 = [1, 7] * theta; fprintf('For population = 70,000, we predict a profit of %f\n',... predict2*10000); fprintf('Program paused. Press enter to continue.\n'); pause;
operation result:
Note the difference between A * B * A. * B, and the former for matrix multiplication, element by element multiplication of the latter
Fourth, the visual cost function J
This code has been given
fprintf('Visualizing J(theta_0, theta_1) ...\n') % Grid over which we will calculate J theta0_vals = linspace(-10, 10, 100); theta1_vals = linspace(-1, 4, 100); % initialize J_vals to a matrix of 0's J_vals = zeros(length(theta0_vals), length(theta1_vals)); % Fill out J_vals for i = 1:length(theta0_vals) for j = 1:length(theta1_vals) t = [theta0_vals(i); theta1_vals(j)]; J_vals(i,j) = computeCost(X, y, t); end end % Because of the way meshgrids work in the surf command, we need to % transpose J_vals before calling surf, or else the axes will be flipped J_vals = J_vals'; % Surface plot figure; surf(theta0_vals, theta1_vals, J_vals) xlabel('\theta_0'); ylabel('\theta_1'); % Contour plot figure; % Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100 contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20)) xlabel('\theta_0'); ylabel('\theta_1'); hold on; plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);
operation result: