Understand large operators from basic statistics

Large operator \Sigma(Sigma).

Total consumption amount

Total consumption amount=x_{1}+x_{2}+...+x_{n}

In the application of mathematics, there is a summation (or summation) symbol \Sigma, which is pronounced Sigma. Express using the following formula:

\sum_{i=1}^{n}x_{i}=x_1+x_2+...+x_n

The consumption records of 10 convenience store customers are as follows, and the total consumption amount is calculated.

66,58,25,78,58,15,120,39,82,50

x = [66, 58, 25, 78, 58, 15, 120, 39, 82, 50]
print('总消费金额 = {}'.format(sum(x)))

The running results are as follows:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
总消费金额 = 591

[Done] exited with code=0 in 0.281 seconds

Calculate average single purchase amount

In the field of statistics or mathematics, when calculating the average, a horizontal line can be added above the average variable to represent the average, as follows:

\bar{x}

The mean variable can be read as x bar.

Use the following formula to express the average:

\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_i=\frac{x_1+x_2+...+x_n}{n}

Sales data calculates average consumption amount:

x = [66, 58, 25, 78, 58, 15, 120, 39, 82, 50]
print('平均消费金额 = {}'.format(sum(x)/len(x)))

The running results are as follows:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\ch19_2.py"
平均消费金额 = 59.1

[Done] exited with code=0 in 2.244 seconds

There is a mean() method in the numpy module, which can directly establish the average.

Use the mean() method of the numpy module to establish the average of sales data.

import numpy as np

x = [66, 58, 25, 78, 58, 15, 120, 39, 82, 50]
print('平均消费金额 = {}'.format(np.mean(x)))

The running results are as follows:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
平均消费金额 = 59.1

[Done] exited with code=0 in 17.905 seconds

variance

The English word for variance is variance, which mainly describes the degree of dispersion of series data, that is, the deviation distance of all data from the mean.

Suppose there are 2 data as follows:

(10,10,10,10,10) #The average is 10

(15,5,18,2,10) #The average is 10

When calculating the distance of each element of two data from the mean:

(0,0,0,0,0) #The first set of data

(5,-5,8,-8,0) #Second set of data

Even if the two sets of data are very different, directly summing the distance of each element from the mean will cause distortion. The reason is that the deviation distance of each element is positive and negative, and the positive and negative offset during the sum. So when the variance is formally defined, the distance between each element and the mean is squared, then summed, and then divided by the number of data. Here are the steps to calculate the equation:

(1) Calculate the average of the data.\bar{x}

(2) Calculate the distance between each element and the average value, take the square, and finally add up.

(x_1-\bar{x})^2+(x_2-\bar{x})^2+\cdots +(x_n-\bar{x})^2

(3) The final calculation formula of variance is as follows: variance =\frac{(x_1-\bar{x})^2+(x_2-\bar{x})^2+\cdots +(x_n-\bar{x})^2}{n}

If \Sigmasymbols are used, the following variance formula can be obtained:

variance=\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2

Sales data, calculate variance:

x = [66, 58, 25, 78, 58, 15, 120, 39, 82, 50]
mean = sum(x) / len(x)

# 计算变异数
var = 0
for v in x:
    var += ((v - mean)**2)
var = var / len(x)
print("变异数 : ", var)

operation result:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
变异数 :  823.49

[Done] exited with code=0 in 1.156 seconds

Use the var() method of the numpy module to establish the variance of the sales data.

import numpy as np

x = [66, 58, 25, 78, 58, 15, 120, 39, 82, 50]
print("变异数 : ",np.var(x))

operation result:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
变异数 :  823.49

[Done] exited with code=0 in 3.03 seconds

standard deviation

Standard deviation in English is Standard Deviation, written as SD. After calculating the variance, take the square root of the result of the variance to get the average distance, and the average distance obtained is the standard deviation.

standard deviation=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2}

Calculate the standard deviation.

x = [66, 58, 25, 78, 58, 15, 120, 39, 82, 50]
mean = sum(x) / len(x)

# 计算变异数
var = 0
for v in x:
    var += ((v - mean)**2)
sd = (var / len(x))**0.5
print("标准偏差 : {0:6.2f}".format(sd))

The running results are as follows:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
标准偏差 :  28.70

[Done] exited with code=0 in 0.758 seconds

There is std() method in the numpy module, which can directly establish the standard deviation.

import numpy as np

x = [66, 58, 25, 78, 58, 15, 120, 39, 82, 50]
print("标准偏差 : {0:6.2f}".format(np.std(x)))

The running results are as follows:

[Running] python -u "c:\Users\a-xiaobodou\OneDrive - Microsoft\Projects\tempCodeRunnerFile.py"
标准偏差 :  28.70

[Done] exited with code=0 in 2.563 seconds

\SigmaSymbolic operation rules and verification

\sum_{i=1}^{n}(x_i+y_i)=\sum_{i=1}^{n}x_i+\sum_{i=1}^{n}y_i

\sum_{i=1}^{n}(x_i-y_i)=\sum_{i=1}^{n}x_i-\sum_{i=1}^{n}y_i

\sum_{i=1}^{n}cx_i=c\sum_{i=1}^{n}x_i

\sum_{i=1}^{n}c=nc

Conjugation \Sigmasign

omission

Guess you like

Origin blog.csdn.net/DXB2021/article/details/127186135
Recommended