jacobian行列式以及生成相同协方差的数据和不同协方差的数据的函数

`jacobian`

设随机变量 $x$ 服从正态分布 $N=(\mu,\sigma^2)$ ,概率密度函数为 $f(x)$ ,则 $x$ 的仿射变换(affine transformation):

\begin{matrix} (1) & r = a x + b \end{matrix}

$r = ax + b \tag{1}$
随机变量

r

$r$ 服从分布 $N(a\mu +b,a^2\sigma^2)$ ，即随机变量

r

$r$ 的概率密度函数是：

\begin{matrix} (2) & g (r) = \frac{1}{| a |} f (x) \end{matrix}

$g(r)=\frac{1}{\vert a\vert}f(x) \tag{2}$
证明如图：

一维：
二维到多维：

如图，jacobian ：The determinant is the product of all eigenvalues 。

推广到多维随机变量上

假设随机向量 $\mathbf y =\{y^{(1)},y^{(2)},...,y^{(d)}\}$ 的每一个分量都服从标准正态分布 $N(0,1)$ ,概率密度函数是 $g(\mathbf y)$ ,得到 $\mathbf y$ 的期望是 $E[\mathbf y]=\mathbf 0$ ,协方差矩阵 $V[\mathbf y]= \mathbf I$ 。

这里写图片描述
做变换：

\begin{matrix} (3) & x = T y + μ \end{matrix}

$x=\mathbf T\mathbf y +\mu \tag{3}$
其中

T

$\mathbf T$ 是

n \times n

$n\times n$ 可逆矩阵(invertible matrix);
得到

x

$x$ 的概率密度函数是：

\begin{matrix} (4) & f (x) = g (y) | det (T) |^{- 1} \end{matrix}

$f(x)=g(\mathbf y)\vert \det(\mathbf T) \vert ^{-1} \tag{4}$

这里写图片描述
推广到多维度时候：jacobian是

\begin{matrix} (5) & Σ = T T^{T} \end{matrix}

$\mathbf \Sigma=\mathbf T\mathbf T^T \tag{5}$
得到多元正太分布的推广形式：

x \sim N (μ, Σ)

$x\sim N(\mu,\Sigma)$ .

结论：

所以要保证生成协方差一样的数据，可以用标准正态分布做同一个 $T$ 变换。

代码:

# Generate datasets
def dataset_fixed_cov():
    '''Generate 2 Gaussians samples with the same covariance matrix'''
    n, dim = 300, 2
    np.random.seed(0)
    C = np.array([[0., -0.23], [0.83, .23]])
    X = np.r_[np.dot(np.random.randn(n, dim), C),#这里是第一类数据的协方差矩阵
              #这是第二类数据的生成，都是乘以C，所以协方差矩阵一样
              np.dot(np.random.randn(n, dim), C) + np.array([1, 1])]
    y = np.hstack((np.zeros(n), np.ones(n)))
    return X, y


def dataset_cov():
    '''Generate 2 Gaussians samples with different covariance matrices'''
    n, dim = 300, 2
    np.random.seed(0)
    C = np.array([[0., -1.], [2.5, .7]]) * 2.
    X = np.r_[np.dot(np.random.randn(n, dim), C),
                #这里乘以的C的转置，所以协方差不一样
              np.dot(np.random.randn(n, dim), C.T) + np.array([1, 4])]
    y = np.hstack((np.zeros(n), np.ones(n)))
    return X, y

numpy.cov计算协方差矩阵

x = np.array([[0, 2], [1, 1], [2, 0]]).T  
>>> x  
array([[0, 1, 2],  
       [2, 1, 0]])

>>> np.cov(x)  
array([[ 1., -1.],  
       [-1.,  1.]])

这里的计算公式是样本方差和样本协方差：
均值：

\bar{X} = \frac{1}{N} \sum_{i = 1}^{N} X_{i}

$\bar X=\frac{1}{N}\sum_{i=1}^NX_i$
方差：

v a r = \frac{1}{N} \sum_{i = 1}^{N} (X_{i} - \bar{X})^{2}

$var=\frac{1}{N}\sum_{i=1}^N(X_i-\bar X)^2$

样本方差：

S = \frac{1}{N - 1} \sum_{i = 1}^{N} (X_{i} - \bar{X})^{2}

$S=\frac{1}{N-1}\sum_{i=1}^N(X_i-\bar X)^2$
样本协方差：

c o v = \frac{1}{N - 1} \sum_{i = 1}^{N} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})

$cov = \frac{1}{N-1}\sum_{i=1}^N(X_i- \bar X)(Y_i -\bar Y)$

注意：
- 协方差矩阵计算的是不同维度之间的协方差，而不是不同样本之间的。理解协方差矩阵的关键就在于牢记它计算的是不同维度之间的协方差，而不是不同样本之间，拿到一个样本矩阵，我们最先要明确的就是一行是一个样本还是一个维度。

例子2：

>>> X=np.array([[1 ,5 ,6] ,[4 ,3 ,9 ],[ 4 ,2 ,9],[ 4 ,7 ,2]])
>>> np.cov(X)
array([[  7.  ,   4.5    ,       4.   ,   -0.5     ],
       [  4.5 ,  10.33333333,   11.5  ,  -7.16666667],
       [  4.  ,  11.5     ,      13.  ,   -8.5      ],
       [ -0.5 ,  -7.16666667,   -8.5  ,  6.33333333]])

参考：https://blog.csdn.net/lilong117194/article/details/78399568
《统计机器学习导论》
wikipedia