Python finds mean, variance, standard deviation SD, relative standard deviation RSD

average

The mean is the most commonly used statistic in statistics, and it is used to indicate the central position where each observed value in the data is relatively concentrated. It is used to reflect the general level of the overall phenomenon, or the central tendency of the distribution.

import numpy as np

a = [2, 4, 6, 8]

print(np.mean(a))  # 均值
print(np.average(a, weights=[1, 2, 1, 1]))  # 带权均值

variance

Variance is used to calculate the difference between each variable (observation) and the population mean. In order to avoid the sum of the deviations from the mean being zero, and the sum of the squares of the deviations from the mean is affected by the sample size, statistics use the average sum of the squares of the deviations from the mean to describe the degree of variation of the variable. The overall variance calculation formula:

insert image description here
In actual work, when the overall mean is difficult to obtain, the sample statistic is used instead of the overall parameter. After correction, the formula for calculating the sample variance is:
insert image description here

import numpy as np

a = [2, 4, 6, 8]

print(np.var(a))  # 总体方差
print(np.var(a, ddof=1))  # 样本方差

SD

Standard Deviation (Std Dev, Standard Deviation), a standard for measuring the degree of dispersion of data distribution, is used to measure the degree to which data values ​​deviate from the arithmetic mean. The smaller the standard deviation, the less the values ​​deviate from the mean, and vice versa.

insert image description here

import numpy as np

a = [2, 4, 6, 8]

print(np.std(a))  # 总体标准差
print(np.std(a, ddof=1))  # 样本标准差

Relative standard deviation RSD

Relative standard deviation (relative standard deviation; RSD) is also called standard deviation coefficient, variation coefficient, variation coefficient, etc. The value obtained by dividing the standard deviation by the corresponding average value multiplied by 100% can be used to analyze the precision of the results in the inspection and testing work.

insert image description here

import numpy as np

a = [2, 4, 6, 8]

RSD = np.std(a, ddof=1)/np.mean(a)
print(RSD)

summary

import numpy as np

a = [2, 4, 6, 8]

print(np.mean(a))  # 均值
print(np.average(a, weights=[1, 2, 1, 1]))  # 带权均值

print(np.var(a))  # 总体方差
print(np.var(a, ddof=1))  # 样本方差

print(np.std(a))  # 总体标准差
print(np.std(a, ddof=1))  # 样本标准差

RSD = np.std(a, ddof=1)/np.mean(a)  # 相对标准偏差
print(RSD)

Numpy's data dispersion measure

function Function
np.mean(list_a) Calculate the mean of the list list_a
np.average(list_a) Calculate the mean of the list list_a
np.average(list_a, weights = [1, 2, 1, 1]) Computes the weighted average of the list list_a
np.var(list_a) Calculate the population variance of the list list_a
np.var(list_a, dof = 1) Calculate the sample variance of the list list_a
np.std(list_a) Calculate the population standard deviation of the list list_a
np.std(list_a, ddof = 1) Computes the sample standard deviation of list list_a
np.median(list_a) Calculate the median of the list list_a
np.mode(list_a) Calculate the mode of the list list_a
np.percentile(list_a, (25)) Calculate the 1st quartile of the list list_a
np.percentile(list_a, (50)) Calculate the 2nd quartile of the list list_a
np.percentile(list_a, (75)) Calculate the 3rd quartile of the list list_a
np.percentile(list_a, (25)) - np.percentile(list_a, (75)) Calculate the interquartile range of list list_a
np.max(list_a) - np.min(list_a)) Calculate the range of the list list_a

quartile

Quartile (Quartile), also known as quartile point, refers to the value in statistics that arranges all values ​​​​from small to large and divides them into four equal parts, at the position of three cut points. It is mostly used in the drawing of box plots in statistics. It is the value at the 25% and 75% position after sorting a set of data. The quartile is to divide all the data into 4 parts by 3 points, and each part contains 25% of the data. Obviously, the middle quartile is the median, so the quartile usually refers to the value at the 25% position (called the lower quartile) and the value at the 75% position value (called the upper quartile).

extremely bad

Extreme difference, also known as range error or range (Range), expressed in R, is used to represent the amount of variation (measures of variation) in statistical data, the gap between the maximum value and the minimum value, that is, the maximum value minus the minimum value The data obtained after the value.

Guess you like

Origin blog.csdn.net/weixin_43912621/article/details/132125054