Python numpy package and the package pandas are able to calculate the mean, variance, summarize their usage.
1. Numpy calculate the mean, variance, standard deviation
generally mean method can be obtained by mean of numpy:
>>> import numpy as np >>> a = [5, 6, 16, 9] >>> np.mean(a) 9.0
numpy average method can not only be obtained in a simple average, a weighted average may be obtained. average weights which can be followed by a parameter, which is an array of a number of rights, such as:
>>> np.average(a) >>> 9.0 >>> np.average(a, weights = [1, 2, 1, 1]) >>> 8.4
When calculating the variance, may be utilized in numpy var function, the default is the population variance (the number of samples N divided by calculation), if necessary to obtain the variance of the sample (calculated by dividing the time N - 1), with the required parameters ddo f = 1, e.g.
>>> Import pnumpy AS NP >>> A = [. 5,. 6, 16,. 9 ] >>> np.var (A) # calculates the population variance 18.5 Np.var >>> (A, ddof =. 1) # calculated sample variance 24.666666666666668 >>> b = [[4, 5], [6, 7]] >>> b [[4, 5], [6, 7]] Np.var >>> (b) # calculate the variance matrix of all the elements 1.25 Np.var >>> (B, Axis = 0) # calculates the variance for each column matrix Array ([. 1.,. 1 .]) Np.var >>> (B, Axis =. 1) # calculates the variance of each row matrix array ([0.25, 0.25])
Calculating the standard deviation, may be utilized in numpy std function, use and function like var, a population standard deviation default, if necessary to obtain the sample standard deviation, with the required parameters ddof = 1,
>>> Import pnumpy AS NP >>> A = [. 5,. 6, 16,. 9 ] >>> np.std (A) # calculating population standard deviation 4.301162633521313 Np.std >>> (A, ddof =. 1) # calculates the sample standard deviation 4.96655480858378 Np.std >>> (B) # calculating matrix elements in all the standard difference 1.118033988749895 Np.std >>> (B, Axis = 0) # calculate standard deviation for each column matrix Array ([. 1.,. 1 .]) Np.std >>> (B, Axis =. 1) # calculate standard deviation for each column of the matrix array ([0.5, 0.5])
2. Pandas calculate the mean, variance, standard deviation
for PANDAS, can also be used inside the mean average of the function can be determined for all rows or all the columns, for example:
>>> Import PANDAS AS PD >>> DF = pd.DataFrame (np.array ([[85, 68, 90], [82, 63 is, 88], [84, 90, 78]]), Columns = [ ' statistics ' , ' high number ' , ' English ' ], index = [ ' Joe Smith ' , ' John Doe ' , ' king five ' ]) >>> df Statistically high number of English Zhang 856890 John Doe 826 388 Wang Wu 849078 Df.mean >>> () # shows the average of each column statistics 83.666667 high number 73.666667 English 85.333333 dtype: float64 Df.mean >>> (Axis =. 1) # shows the average for each row of seating 81.000000 Doe 77.666667 Wang Wu 84.000000 DTYPE: float64
To obtain a row or a column of an average value, may be used iloc diverted or select the columns of data, followed by the mean function can be obtained, for example:
>>> df Statistically high number of English Zhang 856890 John Doe 826 388 Wang Wu 849078 Df.iloc >>> [0,:]. Mean () # the average value obtained in the first row 81.0 Df.iloc >>> [:, 2] .mean () # obtain an average of 3 to 85.33333333333333
pandas function can be in the sample variance var (note not generally variance), STD function can be the sample standard deviation, the variance to obtain a row or a column, it is also available iloc select a row or a column, the back talk var std function or functions. For example:
Df.var >>> () # show variance of each column statistically 2.333333 high number 206.333333 English 41.333333 dtype: float64 Df.var >>> (Axis = 1) # show variance of each row of seating 133.000000 John Doe 170.333333 king five 36.000000 dtype: float64 Df.std >>> () # display the standard deviation of each column statistically 1.527525 high number 14.364308 English 6.429101 dtype: float64 Df.std >>> (Axis =. 1) # Display standard deviation of each line Zhang 11.532563 Doe 13.051181 Wangwu 6.000000 dtype: float64 Df.iloc >>> [0,:]. STD () # show the first line standard difference 11.532562594670797 Df.iloc >>> [:, 2] .std () # show the standard deviation of 3 6.429100507328636