【Numpy学习】第三节 Numpy统计相关

0. 思维导图

1. 知识补充

无偏估计参考

无偏估计:对随机变量θ 的估计是 θ ^ \hat{\theta} θ^,如果 E ( θ ^ ) = E ( θ ) E(\hat{\theta})=E(\theta) E(θ^)=E(θ),则称 θ ^ \hat{\theta} θ^为θ 的无偏估计。
σ 2 = 1 n − 1 [ ∑ i = 1 n ( X i − X ˉ i ) 2 ] \sigma^{2}=\frac{1}{n-1}\left[\sum_{i=1}^{n}\left(X_{i}-\bar{X}_{i}\right)^{2}\right] σ2=n11[i=1n(XiXˉi)2]

已知:

(1) E ( X Y ) = E X ∗ E Y E(X Y)=E X * E Y E(XY)=EXEY XY是互相独立的

(2) Var ⁡ ( X ˉ ) = 1 n Var ⁡ ( X ) \operatorname{Var}(\bar{X})=\frac{1}{n} \operatorname{Var}(X) Var(Xˉ)=n1Var(X) 或者 σ ( X ˉ ) 2 = 1 n σ ( X ) 2 \sigma(\bar{X})^{2}=\frac{1}{n} \sigma(X)^{2} σ(Xˉ)2=n1σ(X)2

(3) Var ⁡ ( X ) = E ( X 2 ) − ( E ( X ) ) 2 \operatorname{Var}(X)=E\left(X^{2}\right)-(E(X))^{2} Var(X)=E(X2)(E(X))2

(4) E ( X 2 ) = Var ⁡ ( X ) + ( E ( X ) ) 2 = σ 2 + μ 2 E\left(X^{2}\right)=\operatorname{Var}(X)+(E(X))^{2}=\sigma^{2}+\mu^{2} E(X2)=Var(X)+(E(X))2=σ2+μ2

(5) E ( X ˉ 2 ) = Var ⁡ ( X ˉ ) + ( E ( X ˉ ) ) 2 = 1 n σ 2 + μ 2 E\left(\bar{X}^{2}\right)=\operatorname{Var}(\bar{X})+(E(\bar{X}))^{2}=\frac{1}{n} \sigma^{2}+\mu^{2} E(Xˉ2)=Var(Xˉ)+(E(Xˉ))2=n1σ2+μ2

证明:

E ( ∑ i = 1 n ( X i − X ˉ ) 2 ) E\left(\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}\right) E(i=1n(XiXˉ)2) = E ( ∑ i = 1 n ( X i 2 − 2 X ˉ X i + X ˉ 2 ) =E\left(\sum_{i=1}^{n}\left(X_{i}^{2}-2 \bar{X} X_{i}+\bar{X}^{2}\right)\right. =E(i=1n(Xi22XˉXi+Xˉ2) = E ( ∑ i = 1 n X i 2 ) − E ( ∑ i = 1 n 2 X ˉ X i ) + E ( ∑ i = 1 n X ˉ 2 ) =E\left(\sum_{i=1}^{n} X_{i}^{2}\right)-E\left(\sum_{i=1}^{n} 2 \bar{X} X_{i}\right)+E\left(\sum_{i=1}^{n} \bar{X}^{2}\right) =E(i=1nXi2)E(i=1n2XˉXi)+E(i=1nXˉ2) = ∑ i = 1 n E ( X i 2 ) − 2 E ( X ˉ ∑ i = 1 n X i ) + ∑ i = 1 n E ( X ˉ 2 ) =\sum_{i=1}^{n} E\left(X_{i}^{2}\right)-2 E\left(\bar{X} \sum_{i=1}^{n} X_{i}\right)+\sum_{i=1}^{n} E\left(\bar{X}^{2}\right) =i=1nE(Xi2)2E(Xˉi=1nXi)+i=1nE(Xˉ2) = ∑ i = 1 n E ( X i 2 ) − 2 E ( X ˉ ∑ i = 1 n X i ) + n ⋅ E ( X ˉ 2 ) =\sum_{i=1}^{n} E\left(X_{i}^{2}\right)-2 E\left(\bar{X} \sum_{i=1}^{n} X_{i}\right)+n \cdot E\left(\bar{X}^{2}\right) =i=1nE(Xi2)2E(Xˉi=1nXi)+nE(Xˉ2)

第二项:

2 E ( X ˉ ∑ i = 1 n X i ) = 2 E ( X ˉ ⋅ n X ˉ ) = 2 n ⋅ E ( X ˉ 2 ) 2 E\left(\bar{X} \sum_{i=1}^{n} X_{i}\right)=2 E(\bar{X} \cdot n \bar{X})=2 n \cdot E\left(\bar{X}^{2}\right) 2E(Xˉi=1nXi)=2E(XˉnXˉ)=2nE(Xˉ2)

带回原式:

E ( ∑ i = 1 n ( X i − X ˉ ) 2 ) = ∑ i = 1 n E ( X i 2 ) − n ⋅ E ( X ˉ 2 ) E\left(\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}\right)=\sum_{i=1}^{n} E\left(X_{i}^{2}\right)-n \cdot E\left(\bar{X}^{2}\right) E(i=1n(XiXˉ)2)=i=1nE(Xi2)nE(Xˉ2)

将(4)和(5)带入得:

E ( ∑ i = 1 n ( X i − X ˉ ) 2 ) = ∑ i = 1 n ( σ 2 + μ 2 ) − n ⋅ ( 1 n σ 2 + μ 2 ) E\left(\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}\right)=\sum_{i=1}^{n}\left(\sigma^{2}+\mu^{2}\right)-n \cdot\left(\frac{1}{n} \sigma^{2}+\mu^{2}\right) E(i=1n(XiXˉ)2)=i=1n(σ2+μ2)n(n1σ2+μ2) = n ( σ 2 + μ 2 ) − n ( 1 n σ 2 + μ 2 ) =n\left(\sigma^{2}+\mu^{2}\right)-n\left(\frac{1}{n} \sigma^{2}+\mu^{2}\right) =n(σ2+μ2)n(n1σ2+μ2) = ( n − 1 ) σ 2 =(n-1) \sigma^{2} =(n1)σ2

得证。

2. amin测试

x = np.random.randint(0, 10, 3*4*5).reshape(3, 4, 5)
print(x)
print('-'*50)
print(np.amin(x, axis=0))
print('-'*50)
print(np.amin(x, axis=1))
print('-'*50)
print(np.amin(x, axis=2))

结果:

[[[5 5 3 7 4]  
  [4 8 6 5 3]  
  [6 0 9 0 5]  
  [3 5 0 8 0]] 

 [[4 7 2 0 9]  
  [5 9 0 6 1]  
  [4 2 4 3 0]  
  [5 2 3 3 8]] 

 [[5 5 7 4 8]  
  [3 9 7 7 5]  
  [9 9 8 3 2]
  [8 1 3 4 6]]]
--------------------------------------------------
[[4 5 2 0 4]
 [3 8 0 5 1]
 [4 0 4 0 0]
 [3 1 0 3 0]]
--------------------------------------------------
[[3 0 0 0 0]
 [4 2 0 0 0]
 [3 1 3 3 2]]
--------------------------------------------------
[[3 3 0 0]
 [0 0 0 2]
 [4 3 2 1]]

3. 方差测试

import numpy as np

x = np.array([[11, 12, 13, 14, 15],
              [16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25],
              [26, 27, 28, 29, 30],
              [31, 32, 33, 34, 35]])
print(x.size)
print(np.mean(x))
print(np.var(x))
print(np.mean((x-np.mean(x))**2))
# 无偏估计
print(np.sum((x-np.mean(x))**2)/(x.size-1))
print(np.var(x, ddof=1))
# axis测试
print(np.var(x,axis=0))
print(np.var(x,axis=1))

结果:

25
23.0
52.0
52.0
54.166666666666664
54.166666666666664
[50. 50. 50. 50. 50.]
[2. 2. 2. 2. 2.]

3. 标准差

# TEST 3
x = np.array([[11, 12, 13, 14, 15],
              [16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25],
              [26, 27, 28, 29, 30],
              [31, 32, 33, 34, 35]])

print(np.std(x))
print(np.sqrt(np.var(x)))
print(np.std(x,axis=0))
print(np.std(x,axis=1))

输出结果:

7.211102550927978
7.211102550927978
[7.07106781 7.07106781 7.07106781 7.07106781 7.07106781]
[1.41421356 1.41421356 1.41421356 1.41421356 1.41421356]

4. 极差(peak to peak)

import numpy as np

x = np.random.randint(0, 20, size=[4, 5])
print(x)

print(np.ptp(x)) 
print(np.ptp(x, axis=0))  
print(np.ptp(x, axis=1))

输出:

[[13 11 15  0  6]
 [ 3 16  2 11 12]
 [ 2  1  2  2 18]
 [ 6  1 13 18 11]]
18
[11 15 13 18 12]
[15 14 17 17]

5. 分位数

详解:https://blog.csdn.net/juliarjuliar/article/details/81082934

x = np.random.randint(0,20,[4,5])
print(x)
print(np.percentile(x, [25,50]))
x = x.reshape(-1)
print(x)
print(np.sort(x))
print(np.percentile(x, [25,50]))

结果:

[[18 13  9 19 11]
 [ 1 19  6 14  1]
 [19  5  4 19  0]
 [ 9 17  0 17  4]]
[ 4. 10.]
[18 13  9 19 11  1 19  6 14  1 19  5  4 19  0  9 17  0 17  4]
[ 0  0  1  1  4  4  5  6  9  9 11 13 14 17 17 18 19 19 19 19]
[ 4. 10.]

6. 中位数/均值/加权平均

x = np.random.randint(0, 100, [3, 7])
print(np.sort(x))
print(np.median(x))
print(np.mean(x))
print(np.average(x))
print(np.mean(x, axis=0))
print(np.average(x, axis=0))

w = np.arange(1, 22).reshape(3, 7)
print(np.average(x, weights=w))

输出:

[[ 7 13 14 17 20 24 33]
 [ 3 31 32 35 58 70 94]
 [23 28 64 86 88 91 98]]
32.0
44.23809523809524
44.23809523809524
[44.         25.         33.33333333 52.         30.33333333 58.66666667
 66.33333333]
[44.         25.         33.33333333 52.         30.33333333 58.66666667
 66.33333333]
56.54978354978355

7. 协方差矩阵

x = np.arange(1, 8)
y = np.arange(8, 15)
print(x, y)
print('-'*50)
print(np.var(x))
print(np.cov(x))
print(np.var(x, ddof=1))
print('-'*50)
print(np.var(y))
print(np.cov(y))
print(np.var(y, ddof=1))
print('-'*50)
print(np.cov(x, y))
print('-'*50)
z = np.mean((x - np.mean(x)) * (y - np.mean(y)))  # 协方差
print(z)

z = np.sum((x - np.mean(x)) * (y - np.mean(y))) / (len(x) - 1)  # 样本协方差
print(z)

z = np.dot(x - np.mean(x), y - np.mean(y)) / (len(x) - 1)  # 样本协方差
print(z)

输出:

[1 2 3 4 5 6 7] [ 8  9 10 11 12 13 14]
--------------------------------------------------
4.0
4.666666666666666
4.666666666666667
--------------------------------------------------
4.0
4.666666666666666
4.666666666666667
--------------------------------------------------
[[4.66666667 4.66666667]
 [4.66666667 4.66666667]]
--------------------------------------------------
4.0
4.666666666666667
4.666666666666667

8. 相关系数

x, y = np.random.randint(0, 20, size=(2, 4))

print(x)  
print(y)  

z = np.corrcoef(x, y)
print(z)

a = np.dot(x - np.mean(x), y - np.mean(y))
b = np.sqrt(np.dot(x - np.mean(x), x - np.mean(x)))
c = np.sqrt(np.dot(y - np.mean(y), y - np.mean(y)))
print(a / (b * c)) 

输出:

[ 9 10  4 14]
[19  6 14  0]
[[ 1.         -0.70975624]
 [-0.70975624  1.        ]]
-0.7097562360053747

9. 直方图

x = np.array([0.2, 6.4, 3.0, 1.6])

bins = np.array([0.0, 1.0, 2.5, 4.0, 10.0])

inds = np.digitize(x, bins)

print(inds)  # [1 4 3 2]
for n in range(x.size):
    print(bins[inds[n] - 1], "<=", x[n], "<", bins[inds[n]])

输出:

[1 4 3 2]
0.0 <= 0.2 < 1.0
4.0 <= 6.4 < 10.0
2.5 <= 3.0 < 4.0
1.0 <= 1.6 < 2.5

10. 练习

计算给定数组中每行的最大值。

  • a = np.random.randint(1, 10, [5, 3])
# WORK 1
a = np.random.randint(1,10,[5,3])
print(a)
print(np.amax(a, axis=1))

输出:

[[5 2 4]
 [9 9 2]
 [8 8 6]
 [2 6 2]
 [3 6 3]]
[5 9 8 6 6]

猜你喜欢

转载自blog.csdn.net/DD_PP_JJ/article/details/110183688
今日推荐