Calculate the mean, median, variance, standard deviation, correlation coefficient, and covariance of the matrix in MATLAB

In data analysis and mathematical statistics, it is often necessary to calculate the mean, median, variance, standard deviation, correlation coefficient, and covariance of the matrix. These data can reflect the overall size, dispersion, and correlation of a set of numbers. These data are important indicators for data processing.

Table of contents

1. Average

2, median

3. Standard deviation

4. Variance

5. Correlation coefficient

6. Covariance

1. Average

The average is the arithmetic mean of a set of data. The general solution is to add the values ​​of all elements in a set of data and then divide by the number of all elements. However, MATLAB provides the mean function to calculate the mean of the data, and the format of the call is as follows (where V represents a vector and A represents a matrix):

  1. mean(V): Calculate the arithmetic mean of all data in the vector X.
  2. mean(A): returns a row vector, and each element of the row vector is the arithmetic mean of each column of matrix A.
  3. mean(A,num): When the value of num is 1, the function of this function is the same as mean(A), and it returns a row vector, and each element of the row vector corresponds to the arithmetic mean of each column in the matrix Value; when num is 2, a column vector is returned, and each element of the column vector corresponds to the arithmetic mean of each row in matrix A.
  4. mean(mean(A)): Calculates the mean for all elements of the matrix as a whole.

(1) The following calculates the arithmetic mean value for a vector, for example, calculates the arithmetic mean value of the vector V=[23,56,89,34,12,34,54,67], and uses MATLAB code to solve it, The code looks like this:

V=[23,56,89,34,12,34,54,67];
B=mean(V)

The calculation results can be obtained: the value of B is 46.125.

(2) If the arithmetic mean of each column of a matrix is ​​calculated, the matrix is ​​as follows:

                                                         A=\begin{pmatrix} 83& 38 & 71 &58 \\ 89 & 72 & 64 & 33\\ 42 & 12 & 57 &68 \\ 34& 71 & 92 & 48 \end{pmatrix}

The code to calculate the arithmetic mean of each column by MATLAB is as follows:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=mean(A)

The resulting returned row vector looks like this:

B =
   62.0000   48.2500   71.0000   51.7500

Or as mentioned in point 3 above, the same value can also be obtained by using mean(A,1), the code is as follows:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=mean(A,1)

The result is also:

B =
   62.0000   48.2500   71.0000   51.7500

(3) If you want to calculate the algorithmic average value of each row in the matrix, you can use mean(A,2) to get it; for the same matrix, the code for calculating the algorithmic average value of each row is as follows:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=mean(A,2)

The result of the operation is as follows:

B =
   62.5000
   64.5000
   44.7500
   61.2500

(4) If you want to calculate the arithmetic mean of all elements of the matrix, you can use mean(mean(A)), the code is as follows:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=mean(mean(A))

The final run result was 58.25.

It is more convenient to calculate the average value of matrix elements through the mean of MATLAB, and it can play certain advantages when processing matrices with more data.

2, median

The median refers to the number in the middle of a set of data that is sorted according to the size of the value. If the number of all numbers is odd, then the middle number after sorting is the median. If the number of all numbers is odd If it is an even number, then the median is the average of the middle two items after sorting. The median can reflect the middle index of a set of data in statistics. The median function is provided in MATLAB to solve the median (where V represents a vector and A represents a matrix). The usage is as follows:

  1. median(V): Get the median of a set of vectors.
  2. median(A): Returns a row vector, where each value of the row vector corresponds to the median of each column in matrix A.
  3. median(A, num): When num is 1, it has the same function as median(A), and a row vector is obtained, and each row vector is the median of the corresponding column in the matrix; when num is 2 , what is obtained is a column vector, and each element of the column vector is the median of the corresponding row in the matrix.
  4. median(median(A)): Returns a number that calculates the median of the row vector composed of the median of each column of matrix A.

(1) For example, for the vector V above in this article to solve the median operation, the code is as follows:

V=[23,56,89,34,12,34,54,67];
B=median(V)

Find the median of all elements of this vector to be 44.

(2) If the median function is used to solve the median of each column in matrix A, the code is as follows:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=median(A)

The result of the operation is as follows:

B =
   62.5000   54.5000   67.5000   53.0000

or as shown below:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=median(A,1)

At this time, the result of using this writing method is the same as median(A).

(3) If you want to calculate the median of all elements in each row in a matrix, you can use median(A,2), which returns a column vector, and each element of the vector corresponds to each The median of a row. The code looks like this:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=median(A,2)

The result after running is as follows:

B =
   64.5000
   68.0000
   49.5000
   59.5000

(4) When the result returned by median(median(A)) is a value, it should be noted here that the number returned by this function is not the median of the entire matrix, but a vector composed of the median of each column of the matrix Take the median again. For example:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=median(median(A))

The final result of the operation is 58.5. Through actual operation, the median of the matrix can be obtained as 60, and the calculation result of 58.5 is the row vector [62.5, 54.5, 67.5, 53.0] composed of the median of each column of the matrix. Median, after calculation, the median of the row vector is 58.5, just to verify the conclusion.

3. Standard deviation

In mathematical statistics, the standard deviation can represent the degree of dispersion of a set of data. The smaller the standard deviation, the smaller the degree of dispersion of this set of data, and the larger the standard deviation, the greater the degree of dispersion of this set of data.

In MATLAB, there are two formulas for solving the standard deviation, and the formula for the overall sample difference is as follows:

                                                     \delta =\sqrt{\frac{1}{n-1}\sum_{1}^{n}(x_i-\overline{x})^2}

The formula for the sample standard deviation is as follows:

                                                   \delta =\sqrt{\frac{1}{n}\sum_{1}^{n}(x_i-\overline{x})^2}

The function for calculating the standard deviation in MATLAB is std. The calling method of function std is std(A,flag,num). When num=1, the standard deviation of each column of the matrix is ​​calculated; when num=2, the standard deviation of each row of the matrix is ​​calculated; when flag When =0, it is the standard deviation calculated according to the formula for solving the standard deviation. When flag=1, it is the standard deviation calculated according to the sample standard deviation formula.

For the above formula A, use the std function to calculate the standard deviation of the matrix in four different situations. The code is as follows:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
Avg=mean(mean(A))
std1=std(A,1,1)
std2=std(A,1,2)
std3=std(A,0,1)
std4=std(A,0,2)

The result of the operation is as follows:

Avg =
   58.2500
std1 =
   24.2590   25.0037   13.0958   12.9301
std2 =
   16.6808
   20.3039
   21.0401
   22.1289
std3 =
   28.0119   28.8718   15.1217   14.9304
std4 =
   19.2614
   23.4450
   24.2951
   25.5522

The four results shown above correspond to four different situations of the std function.

4. Variance

In mathematical statistics, in addition to standard deviation, variance can also be used to represent the degree of dispersion of data. The larger the variance, the greater the dispersion of the group of data, and the smaller the variance, the smaller the dispersion of the group of data. The var function is provided in MATLAB to calculate the variance of a set of data. The variance is the square of the standard deviation. Like the standard deviation, the variance also has four different forms, var(A, flag, num). When flag=0, it means the variance calculated by the square of the standard deviation formula. When flag When =1, it means the variance calculated by the square of the sample standard deviation formula. When num=1, the variance of each column of the matrix is ​​calculated. When num=2, the variance of each row of the matrix is ​​calculated.

For example, for the above matrix, the code to calculate the variance of four different cases is as follows:

A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
Avg=mean(mean(A))
var1=var(A,1,1)
var2=var(A,1,2)
var3=var(A,0,1)
var4=var(A,0,2)

The result after the operation is as follows:

var1 =
  588.5000  625.1875  171.5000  167.1875
var2 =
  278.2500
  412.2500
  442.6875
  489.6875
var3 =
  784.6667  833.5833  228.6667  222.9167
var4 =
  371.0000
  549.6667
  590.2500
  652.9167

5. Correlation coefficient

In mathematics, the correlation coefficient represents an indicator of the degree of correlation between data. When the correlation coefficient is larger, the degree of correlation is higher. The correlation coefficient is a relative concept, and the range is between [-1,1]. . The formula for solving the correlation coefficient is as follows:

                                                            r=\frac{\sum (x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum (x_i-\widehat{x})^2(y_i-\overline{y})^2}}

Use the corrcoef function and corr function in MATLAB to calculate the correlation coefficient, where the calling format is as follows:

  1. corr(x,y): Returns the correlation coefficient matrix between the columns of the two matrices, where x and y must be column vectors.
  2. corrcoef(x,y): Returns a matrix of correlation coefficients. If x and y are matrices, corrcoef(x,y) will be converted to a sequence before calculation.

For example, the following example:

A=[43,56,36,75,34,23,45];
B=[76,45,34,24,94,53,71];
[r,p]=corr(A',B')
x=corrcoef(A,B)

The result after running is as follows:

r =
   -0.5058
p =
    0.2468
x =
    1.0000   -0.5058
   -0.5058    1.0000

It can be seen from the above data that the correlation coefficient is -0.5058. The corrcoef function returns the correlation coefficient matrix.

6. Covariance

Covariance is a statistical concept used to measure the overall error between two variables. If two variables have a certain correlation, then use covariance to measure the impact. The formula for calculating the covariance is as follows:

                                                           cov(x)=\frac{\sum_{1}^{n}(x_i-\overline{x})(y_i-\overline{y})}{n-1}

The cov function is provided in MATLAB to calculate the covariance between two related data series. The basic calling method of the cov function in MATLAB is as follows (where V represents a vector and A represents a matrix):

  1. cov(V): Returns the variance of the vector.
  2. cov(A): Returns a matrix that uses each column as a variable and each row is a matrix of samples. The number on the diagonal is the variance of each column, and the number on the off-diagonal is the covariance.
  3. cov(X,Y): Calculate the covariance between X and Y, where X and Y must be of the same size.

(1) When V is a vector, verify whether the returned value is variance, for example:

A=[34,45,73,32,65,72];
B=var(A)
C=cov(A)

The result looks like this:

B =
  353.9000
C =
  353.9000

As can be seen from the above results, if the cov function is used for a vector, the calculation result and variance are the same.

(2) When A is a matrix, as in the following example:

A=[34,45,73;54,37,81;44,19,15];
B=cov(A)
C=var(A)

The result looks like this:

B =
   1.0e+03 *

    0.1000   -0.0400    0.0400
   -0.0400    0.1773    0.4387
    0.0400    0.4387    1.2973
C =
   1.0e+03 *

    0.1000    0.1773    1.2973

By comparison, it can be found that the diagonal part of the matrix obtained by the cov function is the variance of each column of the matrix, while the non-diagonal part calculates the covariance.

(3) Calculate the covariance between two matrices, such as the following example:

A=[34,45,54;37,62,81;19,15,44];
B=[45,64,69;73,74,79;84,85,86];
C=cov(A,B)

The running results are as follows:

C =
  425.7778  -18.8611
  -18.8611  168.9444

cov(X,Y) actually returns a linear convolution between two matrices. From the above results, it can be seen that there is a negative correlation between the two sets of data.

Guess you like

Origin blog.csdn.net/qq_54186956/article/details/126982975