【week2】Linear Regression with Multiple Variables

Octave configuration in different environments

  • Installing Octave on GNU/Linux

Installing Octave on GNU/Linux We recommend using your system package
manager to install Octave.

On Ubuntu, you can use:

sudo apt-get update && sudo apt-get install octave On Fedora, you can
use:

sudo yum install octave-forge Please consult the Octave maintainer’s
instructions for other GNU/Linux systems.

“Warning: Do not install Octave 4.0.0”; checkout the “Resources”
menu’s section of “Installation Issues”.

  1. Octave Resources
    At the Octave command line, typing help followed by a function name displays documentation for a built-in function. For example, help plot will bring up help information for plotting. Further documentation can be found at the Octave documentation pages.

  2. MATLAB Resources
    At the MATLAB command line, typing help followed by a function name displays documentation for a built-in function. For example, help plot will bring up help information for plotting. Further documentation can be found at the MATLAB documentation pages.

一、Multivariate Linear Regression

1.Multiple Features

Linear regression with multiple variables is also known as “multivariate linear regression”.
We now introduce notation for equations where we can have any number of input variables.
Insert picture description here
Insert picture description here

2.Gradient Descent for Multiple Variables

Insert picture description here
The following image compares gradient descent with one variable to gradient descent with multiple variables:
Insert picture description here

3.Gradient Descent in Practice

  • I - Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally:
Insert picture description here
These aren’t exact requirements; we are only trying to speed things up. The goal is to get all input variables into roughly one of these ranges, give or take a few.

Two techniques to help with this are feature scaling and mean normalization. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula:
Insert picture description here

Where μ_iμ i ​ is the average of all the values for feature (i)
and s_is i ​ is the range of values (max - min), or s_is i ​
is the standard deviation.

Note that dividing by the range, or dividing by the standard
deviation, give different results. The quizzes in this course use
range - the programming exercises use standard deviation.

  • II - Learning Rate

Debugging gradient descent. Make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.

Automatic convergence test. Declare convergence if J(θ) decreases by less than E in one iteration, where E is some small value such as 10^{−3}10
−3
. However in practice it’s difficult to choose this threshold value.
Insert picture description here
Insert picture description here
To summarize:

If α is too small: slow convergence. As shown in figure B below

If α is too large: may not decrease on every iteration and thus
may not converge. 如下图的C

Insert picture description here
eg: The function does not fall but rises, indicating that α is too large. B drops slowly, so the value of α is set too small, so the correct choice is B.

4. Features and Polynomial Regression

We can improve our features and the form of our hypothesis function in a couple different ways.

We can combine multiple features into one. For example, we can combine x1,x2 into a new feature x3 by taking x1*x2

Insert picture description here

  • Polynomial Regression

Pay attention to zoom
Insert picture description here

二、Computing Parameters Analytically

1.Normal Equation

Gradient descent gives one way of minimizing J. Let’s discuss a second way of doing so, this time performing the minimization explicitly and without resorting to an iterative algorithm. In the “Normal Equation” method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:
Insert picture description here
There is no need to do feature scaling with the normal equation.
The following is a comparison of gradient descent and the normal equation:
Insert picture description here

2.Normal Equation Noninvertibility

Insert picture description here

Three, Octave use

1.eg1 cosine and sine functions

>> t = [0:0.01:0.98]
>> y1 = sin(2*pi*4*t);
>> plot(t,y1)
>> y2 = cos(2*pi*4*t);
>> plot(t,y2)
>> hold on;
>> plot(t,y2)
>> plot(t,y1,'b')
>> xlabel('time')
>> legend('sin','cos')
>> ylabel('value')
>> title('my plot')
>> print -dpng 'myPlot.png'

Insert picture description here
Insert picture description here
Some formats of the plot function:

 Format arguments:

     linestyle

          '-'  Use solid lines (default).
          '--' Use dashed lines.
          ':'  Use dotted lines.
          '-.' Use dash-dotted lines.

     marker

          '+'  crosshair
          'o'  circle
          '*'  star
          '.'  point
          'x'  cross
          's'  square
          'd'  diamond
          '^'  upward-facing triangle
          'v'  downward-facing triangle
          '>'  right-facing triangle
          '<'  left-facing triangle
          'p'  pentagram
          'h'  hexagram

     color

          'k'  blacK
          'r'  Red
          'g'  Green
          'b'  Blue
          'y'  Yellow
          'm'  Magenta
          'c'  Cyan
          'w'  White

2.eg2 limited position drawing

>> figure(1);plot(t,y1)
>> figure(2);plot(t,y2)
>> subplot(2,2,1)
>> plot(t,y2);
>> subplot(2,2,4)
>> plot(t,y1);
>> axis([0 0.5 -1.2 1.2])		%改变范围,y的取值范围是-1.2~1.2,x的取值范围是0~0.5
>> print -dpng 'q.jpg'
>>clf;

Insert picture description here
Insert picture description here

3.eg3 visualization matrix

>> A = magic(5)
A =

   17   24    1    8   15
   23    5    7   14   16
    4    6   13   20   22
   10   12   19   21    3
   11   18   25    2    9

>> imagesc(A);

Insert picture description here

>> colorbar

Insert picture description here

>> colormap gray;

Insert picture description here

>> imagesc(magic(15)),colorbar,colormap gray

Insert picture description here

4.eg4 control statement

  • for statement
>> v = ones(1,10)
v =
   1   1   1   1   1   1   1   1   1   1
>> for i = 1:10,v(i) = v(i) + i,end
%incices = 1:10
% for i = 1:10 与 for i = indices 等价
v =
    2    3    4    5    6    7    8    9   10   11
  • while+break+if,else语句
>> i = 1;
>> while true,
		v(i) = 9999;
		i = i+1;
		if i == 6
			break;
		elseif i == 5
			disp('The value is one');
		else
			disp('The value is zero');
		end
	end
>> v
v =
   9999   9999   9999   9999   9999      7      8      9     10     11

5.eg5 call function (can have multiple return values)

ps: The file name must be consistent with the function name. The function name I used below is'mytestfunction' and the
Insert picture description here
verification is successful:

>> addpath('E:\Octave\workspace') %添加搜索路径
>> cd 'E:\Octave'				  %改变路径
>> [a,b,c] = mytestfunction(3,4)
a =  9
b =  16
c =  144
>>

ps: The definition of the cost function
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44751294/article/details/109240994