In data analysis and mathematical statistics, it is often necessary to calculate the mean, median, variance, standard deviation, correlation coefficient, and covariance of the matrix. These data can reflect the overall size, dispersion, and correlation of a set of numbers. These data are important indicators for data processing.
Table of contents
1. Average
The average is the arithmetic mean of a set of data. The general solution is to add the values of all elements in a set of data and then divide by the number of all elements. However, MATLAB provides the mean function to calculate the mean of the data, and the format of the call is as follows (where V represents a vector and A represents a matrix):
- mean(V): Calculate the arithmetic mean of all data in the vector X.
- mean(A): returns a row vector, and each element of the row vector is the arithmetic mean of each column of matrix A.
- mean(A,num): When the value of num is 1, the function of this function is the same as mean(A), and it returns a row vector, and each element of the row vector corresponds to the arithmetic mean of each column in the matrix Value; when num is 2, a column vector is returned, and each element of the column vector corresponds to the arithmetic mean of each row in matrix A.
- mean(mean(A)): Calculates the mean for all elements of the matrix as a whole.
(1) The following calculates the arithmetic mean value for a vector, for example, calculates the arithmetic mean value of the vector V=[23,56,89,34,12,34,54,67], and uses MATLAB code to solve it, The code looks like this:
V=[23,56,89,34,12,34,54,67];
B=mean(V)
The calculation results can be obtained: the value of B is 46.125.
(2) If the arithmetic mean of each column of a matrix is calculated, the matrix is as follows:
The code to calculate the arithmetic mean of each column by MATLAB is as follows:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=mean(A)
The resulting returned row vector looks like this:
B =
62.0000 48.2500 71.0000 51.7500
Or as mentioned in point 3 above, the same value can also be obtained by using mean(A,1), the code is as follows:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=mean(A,1)
The result is also:
B =
62.0000 48.2500 71.0000 51.7500
(3) If you want to calculate the algorithmic average value of each row in the matrix, you can use mean(A,2) to get it; for the same matrix, the code for calculating the algorithmic average value of each row is as follows:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=mean(A,2)
The result of the operation is as follows:
B =
62.5000
64.5000
44.7500
61.2500
(4) If you want to calculate the arithmetic mean of all elements of the matrix, you can use mean(mean(A)), the code is as follows:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=mean(mean(A))
The final run result was 58.25.
It is more convenient to calculate the average value of matrix elements through the mean of MATLAB, and it can play certain advantages when processing matrices with more data.
2, median
The median refers to the number in the middle of a set of data that is sorted according to the size of the value. If the number of all numbers is odd, then the middle number after sorting is the median. If the number of all numbers is odd If it is an even number, then the median is the average of the middle two items after sorting. The median can reflect the middle index of a set of data in statistics. The median function is provided in MATLAB to solve the median (where V represents a vector and A represents a matrix). The usage is as follows:
- median(V): Get the median of a set of vectors.
- median(A): Returns a row vector, where each value of the row vector corresponds to the median of each column in matrix A.
- median(A, num): When num is 1, it has the same function as median(A), and a row vector is obtained, and each row vector is the median of the corresponding column in the matrix; when num is 2 , what is obtained is a column vector, and each element of the column vector is the median of the corresponding row in the matrix.
- median(median(A)): Returns a number that calculates the median of the row vector composed of the median of each column of matrix A.
(1) For example, for the vector V above in this article to solve the median operation, the code is as follows:
V=[23,56,89,34,12,34,54,67];
B=median(V)
Find the median of all elements of this vector to be 44.
(2) If the median function is used to solve the median of each column in matrix A, the code is as follows:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=median(A)
The result of the operation is as follows:
B =
62.5000 54.5000 67.5000 53.0000
or as shown below:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=median(A,1)
At this time, the result of using this writing method is the same as median(A).
(3) If you want to calculate the median of all elements in each row in a matrix, you can use median(A,2), which returns a column vector, and each element of the vector corresponds to each The median of a row. The code looks like this:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=median(A,2)
The result after running is as follows:
B =
64.5000
68.0000
49.5000
59.5000
(4) When the result returned by median(median(A)) is a value, it should be noted here that the number returned by this function is not the median of the entire matrix, but a vector composed of the median of each column of the matrix Take the median again. For example:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
B=median(median(A))
The final result of the operation is 58.5. Through actual operation, the median of the matrix can be obtained as 60, and the calculation result of 58.5 is the row vector [62.5, 54.5, 67.5, 53.0] composed of the median of each column of the matrix. Median, after calculation, the median of the row vector is 58.5, just to verify the conclusion.
3. Standard deviation
In mathematical statistics, the standard deviation can represent the degree of dispersion of a set of data. The smaller the standard deviation, the smaller the degree of dispersion of this set of data, and the larger the standard deviation, the greater the degree of dispersion of this set of data.
In MATLAB, there are two formulas for solving the standard deviation, and the formula for the overall sample difference is as follows:
The formula for the sample standard deviation is as follows:
The function for calculating the standard deviation in MATLAB is std. The calling method of function std is std(A,flag,num). When num=1, the standard deviation of each column of the matrix is calculated; when num=2, the standard deviation of each row of the matrix is calculated; when flag When =0, it is the standard deviation calculated according to the formula for solving the standard deviation. When flag=1, it is the standard deviation calculated according to the sample standard deviation formula.
For the above formula A, use the std function to calculate the standard deviation of the matrix in four different situations. The code is as follows:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
Avg=mean(mean(A))
std1=std(A,1,1)
std2=std(A,1,2)
std3=std(A,0,1)
std4=std(A,0,2)
The result of the operation is as follows:
Avg =
58.2500
std1 =
24.2590 25.0037 13.0958 12.9301
std2 =
16.6808
20.3039
21.0401
22.1289
std3 =
28.0119 28.8718 15.1217 14.9304
std4 =
19.2614
23.4450
24.2951
25.5522
The four results shown above correspond to four different situations of the std function.
4. Variance
In mathematical statistics, in addition to standard deviation, variance can also be used to represent the degree of dispersion of data. The larger the variance, the greater the dispersion of the group of data, and the smaller the variance, the smaller the dispersion of the group of data. The var function is provided in MATLAB to calculate the variance of a set of data. The variance is the square of the standard deviation. Like the standard deviation, the variance also has four different forms, var(A, flag, num). When flag=0, it means the variance calculated by the square of the standard deviation formula. When flag When =1, it means the variance calculated by the square of the sample standard deviation formula. When num=1, the variance of each column of the matrix is calculated. When num=2, the variance of each row of the matrix is calculated.
For example, for the above matrix, the code to calculate the variance of four different cases is as follows:
A=[83,38,71,58;89,72,64,33;42,12,57,68;34,71,92,48];
Avg=mean(mean(A))
var1=var(A,1,1)
var2=var(A,1,2)
var3=var(A,0,1)
var4=var(A,0,2)
The result after the operation is as follows:
var1 =
588.5000 625.1875 171.5000 167.1875
var2 =
278.2500
412.2500
442.6875
489.6875
var3 =
784.6667 833.5833 228.6667 222.9167
var4 =
371.0000
549.6667
590.2500
652.9167
5. Correlation coefficient
In mathematics, the correlation coefficient represents an indicator of the degree of correlation between data. When the correlation coefficient is larger, the degree of correlation is higher. The correlation coefficient is a relative concept, and the range is between [-1,1]. . The formula for solving the correlation coefficient is as follows:
Use the corrcoef function and corr function in MATLAB to calculate the correlation coefficient, where the calling format is as follows:
- corr(x,y): Returns the correlation coefficient matrix between the columns of the two matrices, where x and y must be column vectors.
- corrcoef(x,y): Returns a matrix of correlation coefficients. If x and y are matrices, corrcoef(x,y) will be converted to a sequence before calculation.
For example, the following example:
A=[43,56,36,75,34,23,45];
B=[76,45,34,24,94,53,71];
[r,p]=corr(A',B')
x=corrcoef(A,B)
The result after running is as follows:
r =
-0.5058
p =
0.2468
x =
1.0000 -0.5058
-0.5058 1.0000
It can be seen from the above data that the correlation coefficient is -0.5058. The corrcoef function returns the correlation coefficient matrix.
6. Covariance
Covariance is a statistical concept used to measure the overall error between two variables. If two variables have a certain correlation, then use covariance to measure the impact. The formula for calculating the covariance is as follows:
The cov function is provided in MATLAB to calculate the covariance between two related data series. The basic calling method of the cov function in MATLAB is as follows (where V represents a vector and A represents a matrix):
- cov(V): Returns the variance of the vector.
- cov(A): Returns a matrix that uses each column as a variable and each row is a matrix of samples. The number on the diagonal is the variance of each column, and the number on the off-diagonal is the covariance.
- cov(X,Y): Calculate the covariance between X and Y, where X and Y must be of the same size.
(1) When V is a vector, verify whether the returned value is variance, for example:
A=[34,45,73,32,65,72];
B=var(A)
C=cov(A)
The result looks like this:
B =
353.9000
C =
353.9000
As can be seen from the above results, if the cov function is used for a vector, the calculation result and variance are the same.
(2) When A is a matrix, as in the following example:
A=[34,45,73;54,37,81;44,19,15];
B=cov(A)
C=var(A)
The result looks like this:
B =
1.0e+03 *
0.1000 -0.0400 0.0400
-0.0400 0.1773 0.4387
0.0400 0.4387 1.2973
C =
1.0e+03 *
0.1000 0.1773 1.2973
By comparison, it can be found that the diagonal part of the matrix obtained by the cov function is the variance of each column of the matrix, while the non-diagonal part calculates the covariance.
(3) Calculate the covariance between two matrices, such as the following example:
A=[34,45,54;37,62,81;19,15,44];
B=[45,64,69;73,74,79;84,85,86];
C=cov(A,B)
The running results are as follows:
C =
425.7778 -18.8611
-18.8611 168.9444
cov(X,Y) actually returns a linear convolution between two matrices. From the above results, it can be seen that there is a negative correlation between the two sets of data.