1 最小二乘法求解矩阵形式推导
设训练样本集为
(
x
i
,
y
i
)
(x_i,y_i)
( x i , y i ) ,一元(向量)线性回归 可表示为:
f
(
x
i
)
=
w
T
x
i
⃗
+
b
f(x_i)=w^T\vec{x_i}+b
f ( x i ) = w T x i
+ b
若把样本输入
x
i
⃗
\vec{x_i}
x i
表示成矩阵形式(设有n个样本输入,每个输入有d个特性 ),有:
X
=
[
x
11
x
12
.
.
.
x
1
d
1
x
21
x
22
.
.
.
x
2
d
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
n
1
x
n
2
.
.
.
x
n
d
1
]
=
[
x
1
T
1
x
2
T
1
.
.
.
.
.
.
x
n
T
1
]
\boldsymbol X=\begin{bmatrix}x_{11} &x_{12}&...&x_{1d}&1\\x_{21} &x_{22}&...&x_{2d}&1\\...&...&...&...&...\\x_{n1} &x_{n2}&...&x_{nd}&1\end{bmatrix}=\begin{bmatrix}{x_1^T}&&1\\{x_2^T}&&1\\...&&...\\{x_n^T}&&1\end{bmatrix}
X = ⎣ ⎢ ⎢ ⎡ x 1 1 x 2 1 . . . x n 1 x 1 2 x 2 2 . . . x n 2 . . . . . . . . . . . . x 1 d x 2 d . . . x n d 1 1 . . . 1 ⎦ ⎥ ⎥ ⎤ = ⎣ ⎢ ⎢ ⎡ x 1 T x 2 T . . . x n T 1 1 . . . 1 ⎦ ⎥ ⎥ ⎤ 其中1表示偏置
b
b
b ,
w
~
=
(
w
⃗
;
b
)
{\widetilde w}=(\vec w;b)
w
= ( w
; b )
则多元线性回归 可表示为:
y
⃗
=
X
w
~
\vec y=\boldsymbol X{\widetilde w}
y
= X w
其中
y
⃗
=
(
y
1
,
y
2
,
.
.
.
,
y
n
)
\vec y=(y_1,y_2,...,y_n)
y
= ( y 1 , y 2 , . . . , y n ) 表示样本标签
最小二乘法 可表示为
m
i
n
w
∣
∣
y
⃗
−
X
w
~
∣
∣
2
2
=
m
i
n
w
(
y
⃗
−
X
w
~
)
T
(
y
⃗
−
X
w
~
)
{\underset {w}{min}||\vec y - \boldsymbol X{\widetilde w}||_2}^2=\underset{w}{min}(\vec y - \boldsymbol X{\widetilde w})^T(\vec y - \boldsymbol X{\widetilde w})
w m i n ∣ ∣ y
− X w
∣ ∣ 2 2 = w m i n ( y
− X w
) T ( y
− X w
)
L
w
=
1
2
(
y
⃗
−
X
w
~
)
T
(
y
⃗
−
X
w
~
)
=
1
2
(
y
⃗
T
−
w
~
T
X
T
)
(
y
⃗
−
X
w
~
)
=
1
2
(
y
⃗
T
y
⃗
−
y
⃗
T
X
w
~
−
w
~
T
X
T
y
⃗
+
w
~
T
X
T
X
w
~
)
\begin{aligned} L_w & =\frac{1}{2}(\vec y - \boldsymbol X{\widetilde w})^T(\vec y - \boldsymbol X{\widetilde w})\\ &=\frac{1}{2}({\vec y} ^T-{\widetilde w}^T{\boldsymbol X}^T)(\vec y - \boldsymbol X{\widetilde w})\\ &=\frac{1}{2}({\vec y} ^T{\vec y} -{\vec y} ^T{\boldsymbol X}{\widetilde w}-{\widetilde w}^T{\boldsymbol X}^T{\vec y} + {\widetilde w}^T{\boldsymbol X}^T{\boldsymbol X}{\widetilde w}) \end{aligned}
L w = 2 1 ( y
− X w
) T ( y
− X w
) = 2 1 ( y
T − w
T X T ) ( y
− X w
) = 2 1 ( y
T y
− y
T X w
− w
T X T y
+ w
T X T X w
)
∂
L
w
∂
w
=
1
2
[
−
(
y
⃗
T
X
)
T
−
X
T
y
⃗
+
X
T
X
w
~
+
w
~
T
X
T
X
)
T
]
=
1
2
(
−
2
X
T
y
⃗
+
2
X
T
X
w
~
)
\begin{aligned} \frac {\partial L_w} {\partial w} &=\frac{1}{2}[-(\vec y ^ T\boldsymbol X )^T-\boldsymbol X ^ T \vec y + \boldsymbol X ^ T \boldsymbol X {\widetilde w} + {\widetilde w}^T\boldsymbol X ^ T \boldsymbol X )^T] \\ &=\frac{1}{2}(-2\boldsymbol X^T \vec y+2\boldsymbol X ^T \boldsymbol X {\widetilde w} ) \end{aligned}
∂ w ∂ L w = 2 1 [ − ( y
T X ) T − X T y
+ X T X w
+ w
T X T X ) T ] = 2 1 ( − 2 X T y
+ 2 X T X w
) 注:
∂
(
X
θ
)
∂
θ
=
X
T
\frac{\partial (\boldsymbol X \theta)}{\partial \theta}=\boldsymbol X ^ T
∂ θ ∂ ( X θ ) = X T ,
∂
(
θ
T
X
)
∂
θ
T
=
X
T
\frac{\partial (\theta ^ T\boldsymbol X )}{\partial \theta ^ T}=\boldsymbol X ^ T
∂ θ T ∂ ( θ T X ) = X T ,
∂
(
θ
T
X
)
∂
θ
=
X
\frac{\partial (\theta ^T \boldsymbol X)}{{\partial \theta }}=\boldsymbol X
∂ θ ∂ ( θ T X ) = X 即:若上下向量一样,则结果为矩阵的转置,若互为转置,则结果为原矩阵
∂
L
w
∂
w
=
0
\frac {\partial L_w} {\partial w}=0
∂ w ∂ L w = 0
⇒
X
T
y
⃗
=
2
X
T
X
w
~
\Rightarrow \boldsymbol X^T \vec y=2\boldsymbol X ^T \boldsymbol X {\widetilde w}
⇒ X T y
= 2 X T X w
⇒
w
~
=
(
X
T
X
)
−
1
X
T
y
⃗
\Rightarrow {\widetilde w} = (\boldsymbol X^T \boldsymbol X)^{-1}\boldsymbol X^T \vec y
⇒ w
= ( X T X ) − 1 X T y
2 岭回归求解公式
当数据特征较样本数多时,即样本数不足(d>n:未知数个数大于方程个数),输入数据不是满秩矩阵,这将导致非满秩矩阵
X
T
X
\boldsymbol X^T \boldsymbol X
X T X 在求逆时会发生问题。岭回归是在
X
T
X
\boldsymbol X^T \boldsymbol X
X T X 上加一个正则项
λ
I
\lambda \boldsymbol I
λ I 从而使矩阵非奇异 ,进而能对
X
T
X
+
λ
I
\boldsymbol X^T \boldsymbol X+\lambda \boldsymbol I
X T X + λ I 求逆,即:
⇒
w
~
=
(
X
T
X
+
λ
I
)
−
1
X
T
y
⃗
\Rightarrow {\widetilde w} = (\boldsymbol X^T \boldsymbol X+\lambda \boldsymbol I)^{-1}\boldsymbol X^T \vec y
⇒ w
= ( X T X + λ I ) − 1 X T y
λ
\lambda
λ 是正则化系数 ,可提供分类器泛化性,防止过拟合。当
λ
\lambda
λ 较小时,系数与普通矩阵一样,而较大时,使得求解的参数都接近于0 注:岭回归的最小二乘法 可表示为:
m
i
n
w
(
∣
∣
y
⃗
−
X
w
~
∣
∣
2
2
+
λ
∣
∣
w
~
∣
∣
2
2
)
{\underset {w}{min}(||\vec y - \boldsymbol X{\widetilde w}||_2}^2+{\lambda||\widetilde w||_2}^2)
w m i n ( ∣ ∣ y
− X w
∣ ∣ 2 2 + λ ∣ ∣ w
∣ ∣ 2 2 )
由于KCF算法是在傅里叶域内计算,牵涉到复数矩阵,所以我们将结果都统一写成复数域中形式
(1)
⇒
w
~
=
(
X
H
X
+
λ
I
)
−
1
X
H
y
⃗
\Rightarrow {\widetilde w} = (\boldsymbol X^H \boldsymbol X+\lambda \boldsymbol I)^{-1}\boldsymbol X^H \vec y \tag{1}
⇒ w
= ( X H X + λ I ) − 1 X H y
( 1 ) 其中
X
\boldsymbol X
X 表示由基样本生成的循环矩阵 ,
X
H
\boldsymbol X ^H
X H 表示其复数的共轭转置
3 循环矩阵在傅氏空间对角化
循环矩阵公式表达及直观表示
X
=
C
(
x
⃗
)
=
[
x
1
x
2
x
3
.
.
.
x
n
x
n
x
1
x
2
.
.
.
x
n
−
1
x
n
−
1
x
n
x
1
.
.
.
x
n
−
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
2
x
3
x
4
.
.
.
x
1
]
\boldsymbol X=\boldsymbol C(\vec x)=\begin{bmatrix} x_1& x_2 & x_3&...&x_n \\ x_n&x_1& x_2 &...&x_{n-1}\\ x_{n-1}&x_{n}& x_1 &...&x_{n-2}\\ ...&...&...&...&...&\\ x_2 & x_3&x_4&...&x_1\\ \end{bmatrix}
X = C ( x
) = ⎣ ⎢ ⎢ ⎢ ⎢ ⎡ x 1 x n x n − 1 . . . x 2 x 2 x 1 x n . . . x 3 x 3 x 2 x 1 . . . x 4 . . . . . . . . . . . . . . . x n x n − 1 x n − 2 . . . x 1 ⎦ ⎥ ⎥ ⎥ ⎥ ⎤
任何循环矩阵可以被傅里叶变换矩阵对角化 ,即
(2)
X
=
C
(
x
)
=
F
d
i
a
g
(
x
^
)
F
H
\boldsymbol X=\boldsymbol C(x)=\boldsymbol F diag(\widehat x)\boldsymbol F ^ H \tag{2}
X = C ( x ) = F d i a g ( x
) F H ( 2 ) 其中
x
^
由
X
\widehat x由\boldsymbol X
x
由 X 的第1行元素(即基样本)经傅里叶变换后得到,
F
\boldsymbol F
F 是离散傅里叶变换矩阵,是一个常量
F
=
1
n
[
1
1
.
.
.
1
1
1
ω
.
.
.
ω
n
−
2
ω
n
−
1
1
ω
2
.
.
.
ω
2
(
n
−
2
)
ω
2
(
n
−
1
)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
ω
n
−
1
.
.
.
ω
(
n
−
1
)
(
n
−
2
)
ω
(
n
−
1
)
2
]
\boldsymbol F=\frac {1}{\sqrt n}\begin{bmatrix}1&1&...&1&1\\1& \omega &...&\omega ^{n-2}&\omega ^{n-1}\\1&\omega ^ 2 &...&\omega ^{2(n-2)}&\omega ^{2(n-1)}\\...&...&...&...&...\\1&\omega ^{n-1}&...&\omega ^{(n-1)(n-2)}&\omega ^{(n-1)^2}\end{bmatrix}
F = n
1 ⎣ ⎢ ⎢ ⎢ ⎢ ⎡ 1 1 1 . . . 1 1 ω ω 2 . . . ω n − 1 . . . . . . . . . . . . . . . 1 ω n − 2 ω 2 ( n − 2 ) . . . ω ( n − 1 ) ( n − 2 ) 1 ω n − 1 ω 2 ( n − 1 ) . . . ω ( n − 1 ) 2 ⎦ ⎥ ⎥ ⎥ ⎥ ⎤
要计算
(
1
)
(1)
( 1 ) 式,先求
X
H
X
\boldsymbol X^H \boldsymbol X
X H X ,将
(
2
)
(2)
( 2 ) 式代可得
(3)
X
H
X
=
(
F
d
i
a
g
(
x
^
)
F
H
)
H
F
d
i
a
g
(
x
^
)
F
H
=
(
F
H
)
H
d
i
a
g
(
x
^
)
H
F
H
F
d
i
a
g
(
x
^
)
F
H
=
F
d
i
a
g
(
x
^
∗
)
F
H
F
d
i
a
g
(
x
^
)
F
H
=
F
d
i
a
g
(
x
^
∗
⊙
x
^
)
F
H
\begin{aligned} \boldsymbol X^H \boldsymbol X & =(\boldsymbol F diag(\widehat x)\boldsymbol F^H)^H \boldsymbol F diag(\widehat x)\boldsymbol F^H\\ &=(\boldsymbol F^H)^H diag(\widehat x)^H\boldsymbol F ^H \boldsymbol F diag(\widehat x)\boldsymbol F^H \\ &=\boldsymbol F diag(\widehat x ^ *)\boldsymbol F^H \boldsymbol F diag(\widehat x)\boldsymbol F^H\\ &= \boldsymbol F diag(\widehat x ^ * \odot \widehat x ) \boldsymbol F^H \tag{3} \end{aligned}
X H X = ( F d i a g ( x
) F H ) H F d i a g ( x
) F H = ( F H ) H d i a g ( x
) H F H F d i a g ( x
) F H = F d i a g ( x
∗ ) F H F d i a g ( x
) F H = F d i a g ( x
∗ ⊙ x
) F H ( 3 ) 1.其中
x
^
∗
\widehat x ^ *
x
∗ 与
x
^
\widehat x
x
是共轭关系 2.
x
^
\widehat x
x
表示
x
⃗
\vec x
x
的离散傅里叶变换,即
x
^
=
F
(
x
⃗
)
=
n
F
x
\widehat x=\mathcal F(\vec x)=\sqrt n \boldsymbol F x
x
= F ( x
) = n
F x 3.
(
A
B
C
)
H
=
C
H
B
H
A
H
(\boldsymbol A\boldsymbol B\boldsymbol C)^H=\boldsymbol C^H \boldsymbol B^H \boldsymbol A^H
( A B C ) H = C H B H A H 4.
F
H
F
=
F
F
H
=
I
\boldsymbol F^H \boldsymbol F = \boldsymbol F \boldsymbol F^H = \boldsymbol I
F H F = F F H = I 5.
d
i
a
g
{
(
x
^
)
H
}
=
d
i
a
g
{
(
x
^
∗
)
T
}
=
d
i
a
g
(
x
^
∗
)
diag\left\{(\widehat x)^H\right\}=diag\left\{(\widehat x ^ *)^T\right\}=diag(\widehat x ^ *)
d i a g { ( x
) H } = d i a g { ( x
∗ ) T } = d i a g ( x
∗ ) ----对角矩阵的转置不变 6.
d
i
a
g
(
A
)
d
i
a
g
(
B
)
=
d
i
a
g
(
A
⊙
B
)
diag(\boldsymbol A) diag(\boldsymbol B)=diag(\boldsymbol A \odot \boldsymbol B)
d i a g ( A ) d i a g ( B ) = d i a g ( A ⊙ B ) ----符号
⊙
\odot
⊙ 表示矩阵element-wise的乘法
将
(
3
)
(3)
( 3 ) 代入
(
1
)
(1)
( 1 ) 得:
(4)
w
~
=
(
X
H
X
+
λ
I
)
−
1
X
H
y
⃗
=
(
F
d
i
a
g
(
x
^
∗
⊙
x
^
)
F
H
+
λ
F
I
F
H
)
−
1
X
H
y
⃗
=
(
F
d
i
a
g
(
x
^
∗
⊙
x
^
)
F
H
+
F
d
i
a
g
(
λ
)
F
H
)
−
1
X
H
y
⃗
=
(
F
d
i
a
g
(
x
^
∗
⊙
x
^
+
λ
)
F
H
)
−
1
X
H
y
⃗
=
[
(
F
H
)
−
1
d
i
a
g
(
x
^
∗
⊙
x
^
+
λ
)
−
1
F
−
1
]
X
H
y
⃗
=
[
F
d
i
a
g
(
1
x
^
∗
⊙
x
^
+
λ
)
F
−
1
]
X
H
y
⃗
\begin{aligned} {\widetilde w} &= (\boldsymbol X^H \boldsymbol X+\lambda \boldsymbol I)^{-1}\boldsymbol X^H \vec y\\ &=(\boldsymbol F diag(\widehat x ^ * \odot \widehat x ) \boldsymbol F^H+\lambda \boldsymbol F \boldsymbol I \boldsymbol F^H)^{-1}\boldsymbol X^H \vec y\\ &=(\boldsymbol F diag(\widehat x ^ * \odot \widehat x ) \boldsymbol F^H+\boldsymbol F diag(\lambda)\boldsymbol F^H)^{-1}\boldsymbol X^H \vec y\\ &=(\boldsymbol F diag(\widehat x ^ * \odot \widehat x+\lambda ) \boldsymbol F^H)^{-1}\boldsymbol X^H \vec y\\ &=[(\boldsymbol F^H)^{-1}diag(\widehat x ^ * \odot \widehat x+\lambda ) ^{-1}\boldsymbol F^{-1}]\boldsymbol X^H \vec y\\ &=[\boldsymbol Fdiag(\frac {1}{\widehat x ^ * \odot \widehat x+\lambda} ) \boldsymbol F^{-1}]\boldsymbol X^H \vec y \tag{4} \end{aligned}
w
= ( X H X + λ I ) − 1 X H y
= ( F d i a g ( x
∗ ⊙ x
) F H + λ F I F H ) − 1 X H y
= ( F d i a g ( x
∗ ⊙ x
) F H + F d i a g ( λ ) F H ) − 1 X H y
= ( F d i a g ( x
∗ ⊙ x
+ λ ) F H ) − 1 X H y
= [ ( F H ) − 1 d i a g ( x
∗ ⊙ x
+ λ ) − 1 F − 1 ] X H y
= [ F d i a g ( x
∗ ⊙ x
+ λ 1 ) F − 1 ] X H y
( 4 ) 1.
λ
I
=
λ
F
F
H
=
λ
F
I
F
H
\lambda \boldsymbol I = \lambda \boldsymbol F \boldsymbol F^H=\lambda \boldsymbol F \boldsymbol I \boldsymbol F^H
λ I = λ F F H = λ F I F H 2.
A
B
C
+
A
D
C
=
A
(
B
+
D
)
C
\boldsymbol A \boldsymbol B \boldsymbol C+\boldsymbol A\boldsymbol D\boldsymbol C=\boldsymbol A(\boldsymbol B +\boldsymbol D)\boldsymbol C
A B C + A D C = A ( B + D ) C 3.
F
F
H
=
I
⇒
F
H
=
F
−
1
\boldsymbol F \boldsymbol F^H=\boldsymbol I\Rightarrow \boldsymbol F^H=\boldsymbol F^{-1}
F F H = I ⇒ F H = F − 1 4.
F
F
H
=
I
⇒
F
=
(
F
H
)
−
1
\boldsymbol F \boldsymbol F^H=\boldsymbol I\Rightarrow \boldsymbol F=(\boldsymbol F^H)^{-1}
F F H = I ⇒ F = ( F H ) − 1 5.
d
i
a
g
(
λ
i
)
−
1
=
d
i
a
g
(
1
λ
i
)
diag(\lambda_i)^{-1}=diag(\frac{1}{\lambda_i})
d i a g ( λ i ) − 1 = d i a g ( λ i 1 )
将
(
2
)
(2)
( 2 ) 式代入
(
4
)
(4)
( 4 ) 式得:
(5)
w
~
=
[
F
d
i
a
g
(
1
x
^
∗
⊙
x
^
+
λ
)
F
−
1
]
[
F
d
i
a
g
(
x
^
)
F
H
]
H
y
⃗
=
[
F
d
i
a
g
(
1
x
^
∗
⊙
x
^
+
λ
)
F
−
1
]
[
(
F
H
)
H
d
i
a
g
(
x
^
)
H
F
H
]
y
⃗
=
F
d
i
a
g
(
1
x
^
∗
⊙
x
^
+
λ
)
F
−
1
F
d
i
a
g
(
x
^
)
H
F
H
y
⃗
=
F
d
i
a
g
(
1
x
^
∗
⊙
x
^
+
λ
)
I
d
i
a
g
(
x
^
)
H
F
H
y
⃗
=
F
d
i
a
g
(
1
x
^
∗
⊙
x
^
+
λ
)
d
i
a
g
(
x
^
)
H
F
H
y
⃗
=
F
d
i
a
g
(
x
^
H
x
^
∗
⊙
x
^
+
λ
)
F
H
y
⃗
=
F
d
i
a
g
(
x
^
∗
x
^
∗
⊙
x
^
+
λ
)
F
H
y
⃗
\begin{aligned} {\widetilde w} &=[\boldsymbol Fdiag(\frac {1}{\widehat x ^ * \odot \widehat x+\lambda} ) \boldsymbol F^{-1}][\boldsymbol F diag(\widehat x)\boldsymbol F ^ H ]^H \vec y\\ &=[\boldsymbol Fdiag(\frac {1}{\widehat x ^ * \odot \widehat x+\lambda} ) \boldsymbol F^{-1}][(\boldsymbol F^H)^Hdiag(\widehat x)^H\boldsymbol F^H]\vec y\\ &=\boldsymbol Fdiag(\frac {1}{\widehat x ^ * \odot \widehat x+\lambda} ) \boldsymbol F^{-1}\boldsymbol Fdiag(\widehat x)^H\boldsymbol F^H \vec y\\ &=\boldsymbol Fdiag(\frac {1}{\widehat x ^ * \odot \widehat x+\lambda} ) \boldsymbol I diag(\widehat x)^H\boldsymbol F^H \vec y\\ &=\boldsymbol Fdiag(\frac {1}{\widehat x ^ * \odot \widehat x+\lambda} ) diag(\widehat x)^H\boldsymbol F^H \vec y\\ &=\boldsymbol Fdiag(\frac {\widehat x^H}{\widehat x ^ * \odot \widehat x+\lambda} ) \boldsymbol F^H \vec y\\ &=\boldsymbol Fdiag(\frac {\widehat x^*}{\widehat x ^ * \odot \widehat x+\lambda} ) \boldsymbol F^H \vec y\tag{5} \end{aligned}
w
= [ F d i a g ( x
∗ ⊙ x
+ λ 1 ) F − 1 ] [ F d i a g ( x
) F H ] H y
= [ F d i a g ( x
∗ ⊙ x
+ λ 1 ) F − 1 ] [ ( F H ) H d i a g ( x
) H F H ] y
= F d i a g ( x
∗ ⊙ x
+ λ 1 ) F − 1 F d i a g ( x
) H F H y
= F d i a g ( x
∗ ⊙ x
+ λ 1 ) I d i a g ( x
) H F H y
= F d i a g ( x
∗ ⊙ x
+ λ 1 ) d i a g ( x
) H F H y
= F d i a g ( x
∗ ⊙ x
+ λ x
H ) F H y
= F d i a g ( x
∗ ⊙ x
+ λ x
∗ ) F H y
( 5 )
继续推导
x
^
=
F
(
x
)
⇒
x
=
F
−
1
(
x
^
)
(
F
−
1
表
示
傅
里
叶
逆
变
换
)
\widehat x=\mathcal F(x)\Rightarrow x=\mathcal F^{-1}(\widehat x) (\mathcal F^{-1}表示傅里叶逆变换)
x
= F ( x ) ⇒ x = F − 1 ( x
) ( F − 1 表 示 傅 里 叶 逆 变 换 )
(6)
X
=
C
(
x
⃗
)
=
C
(
F
−
1
(
x
^
)
)
=
F
d
i
a
g
(
x
^
)
F
H
\begin{aligned}\boldsymbol X=\boldsymbol C(\vec x)&=\boldsymbol C(\mathcal F^{-1}(\widehat x) )\\ &=\boldsymbol F diag(\widehat x)\boldsymbol F ^ H \tag{6} \end{aligned}
X = C ( x
) = C ( F − 1 ( x
) ) = F d i a g ( x
) F H ( 6 )
结合
(
5
)
(5)
( 5 ) 式和
(
6
)
(6)
( 6 ) 式得:
(7)
w
~
=
C
[
F
−
1
(
x
^
∗
x
^
∗
⊙
x
^
+
λ
)
]
y
⃗
{\widetilde w} = \boldsymbol C[\mathcal F^{-1} (\frac {\widehat x^*}{\widehat x ^ * \odot \widehat x+\lambda})]\vec y\tag{7}
w
= C [ F − 1 ( x
∗ ⊙ x
+ λ x
∗ ) ] y
( 7 )
利用循环卷积性质:
(8)
F
(
X
y
⃗
)
=
F
[
C
(
x
⃗
)
y
⃗
]
=
x
^
∗
⊙
y
^
=
F
∗
(
x
⃗
)
⊙
F
(
y
⃗
)
\begin{aligned}\mathcal F(\boldsymbol X \vec y)&=\mathcal F[\boldsymbol C(\vec x)\vec y]\\ &=\widehat x ^ * \odot \widehat y\\ &=\mathcal F^*(\vec x)\odot\mathcal F(\vec y)\tag{8} \end{aligned}
F ( X y
) = F [ C ( x
) y
] = x
∗ ⊙ y
= F ∗ ( x
) ⊙ F ( y
) ( 8 )
结合
(
7
)
(7)
( 7 ) 式和
(
8
)
(8)
( 8 ) 式得:
F
(
w
~
)
=
F
(
C
[
F
−
1
(
x
^
∗
x
^
∗
⊙
x
^
+
λ
)
]
y
⃗
)
=
F
∗
[
F
−
1
(
x
^
∗
x
^
∗
⊙
x
^
+
λ
)
]
F
(
y
⃗
)
=
(
x
^
∗
x
^
∗
⊙
x
^
+
λ
)
∗
⊙
y
^
=
(
x
^
∗
)
∗
(
x
^
∗
⊙
x
^
+
λ
)
∗
⊙
y
^
=
x
^
⊙
y
^
x
^
∗
⊙
x
^
+
λ
\begin{aligned}\mathcal F({\widetilde w})&=\mathcal F(\boldsymbol C[\mathcal F^{-1} (\frac {\widehat x^*}{\widehat x ^ * \odot \widehat x+\lambda})]\vec y)\\ &=\mathcal F^*[\mathcal F^{-1} (\frac {\widehat x^*}{\widehat x ^ * \odot \widehat x+\lambda})]\mathcal F(\vec y)\\ &=(\frac {\widehat x^*}{\widehat x ^ * \odot \widehat x+\lambda})^*\odot \widehat y\\ &=\frac {(\widehat x^*)^*}{(\widehat x ^ * \odot \widehat x+\lambda)^*}\odot \widehat y\\ &=\frac {\widehat x\odot \widehat y}{\widehat x ^ * \odot \widehat x+\lambda} \end{aligned}
F ( w
) = F ( C [ F − 1 ( x
∗ ⊙ x
+ λ x
∗ ) ] y
) = F ∗ [ F − 1 ( x
∗ ⊙ x
+ λ x
∗ ) ] F ( y
) = ( x
∗ ⊙ x
+ λ x
∗ ) ∗ ⊙ y
= ( x
∗ ⊙ x
+ λ ) ∗ ( x
∗ ) ∗ ⊙ y
= x
∗ ⊙ x
+ λ x
⊙ y
(9)
⇒
F
(
w
~
)
=
w
^
=
x
^
⊙
y
^
x
^
∗
⊙
x
^
+
λ
\Rightarrow \mathcal F({\widetilde w})=\widehat w = \frac {\widehat x\odot \widehat y}{\widehat x ^ * \odot \widehat x+\lambda}\tag{9}
⇒ F ( w
) = w
= x
∗ ⊙ x
+ λ x
⊙ y
( 9 ) 1.
w
^
\widehat w
w
表示
w
~
\widetilde w
w
的傅里叶变换,
w
~
{\widetilde w}
w
表示
(
w
⃗
;
b
)
(\vec w;b)
( w
; b ) 即实域空间的分类器参数 2.由于
x
^
∗
\widehat x ^ *
x
∗ 是
x
^
\widehat x
x
的共轭,所以
x
^
∗
⊙
x
^
\widehat x ^ * \odot \widehat x
x
∗ ⊙ x
是实数,其共轭为本身 3.
⊙
\odot
⊙ 表示对应元素相乘,——
表示对应元素相除
由上述推导可得分类器参数
w
~
{\widetilde w}
w
:
(10)
w
~
=
F
−
1
(
w
^
)
=
F
−
1
(
x
^
⊙
y
^
x
^
∗
⊙
x
^
+
λ
)
{\widetilde w}={\mathcal F}^{-1}(\widehat w)={\mathcal F}^{-1}(\frac {\widehat x\odot \widehat y}{\widehat x ^ * \odot \widehat x+\lambda})\tag{10}
w
= F − 1 ( w
) = F − 1 ( x
∗ ⊙ x
+ λ x
⊙ y
) ( 1 0 ) 1.
x
^
,
y
^
\widehat x,\widehat y
x
, y
分别表示基样本、标签的傅里叶变换 2.将矩阵运算在傅里叶域转化为点积运算 ,成其是矩阵求逆运算,大大提高了计算速度 3.上式为线性回归下 利用循环矩阵其滤波器 的计算公式
4. 非线性回归滤波器求解
求解方式:找到一个非线性映射函数
φ
(
x
)
\varphi(x)
φ ( x ) ,使映射后的样本在新空间中线性可分 ,那么在新空间中就可以使用脊回归来寻找一个分类器
f
(
x
i
)
=
w
T
φ
(
x
i
)
f(\boldsymbol x_i)=\boldsymbol w^T\varphi(\boldsymbol x_i)
f ( x i ) = w T φ ( x i ) ,其中
φ
(
x
i
)
\varphi \boldsymbol {(x_i)}
φ ( x i ) 表示对样本
x
i
\boldsymbol x_i
x i 通过非线性映射函数
φ
\varphi
φ 进行变换。
将线性滤波器的解
w
\boldsymbol w
w 用样本的线性组合来表示:
w
=
∑
i
α
i
φ
(
x
i
)
\boldsymbol w=\sum_i \alpha_i {\varphi(\boldsymbol x_i)}
w = i ∑ α i φ ( x i ) 则最优化问题不再是求变量
w
w
w ,而是
α
\alpha
α 。该表达式是在对偶空间 中进行的,具体参考SVM 相关理论。
线性条件下的回归问题,经过非线性变换后为:
f
(
z
)
=
w
T
z
=
(
∑
i
n
α
i
φ
(
x
i
)
)
T
.
φ
(
z
)
=
∑
i
n
α
i
φ
T
(
x
i
)
φ
(
z
)
=
∑
i
n
α
i
K
(
x
i
,
z
)
\begin{aligned} f(z)&=\boldsymbol w^T\boldsymbol z\\ &=(\sum_i^n \alpha_i {\varphi(\boldsymbol x_i)})^T.\varphi(\boldsymbol z)\\ &=\sum_i^n {\alpha_i}\varphi^T(\boldsymbol x_i)\varphi(\boldsymbol z)\\ &=\sum_i^n\alpha_i\mathcal K(\boldsymbol x_i,\boldsymbol {z}) \end{aligned}
f ( z ) = w T z = ( i ∑ n α i φ ( x i ) ) T . φ ( z ) = i ∑ n α i φ T ( x i ) φ ( z ) = i ∑ n α i K ( x i , z ) 1.
K
(
x
,
x
′
)
=
φ
T
(
x
)
φ
(
x
′
)
\mathcal K(\boldsymbol x,\boldsymbol x')=\varphi^T{(\boldsymbol x)}{\varphi(\boldsymbol x')}
K ( x , x ′ ) = φ T ( x ) φ ( x ′ ) ,
K
\mathcal K
K 表示核函数 ,如高斯或多项式 2.
K
i
j
=
K
(
x
i
,
x
j
)
K_{ij}=\mathcal K(\boldsymbol x_i,\boldsymbol x_j)
K i j = K ( x i , x j ) ,
K
K
K 为
n
×
n
n \times n
n × n 的核矩阵,表示所有样本对的点乘 操作 3.
n
n
n 表示训练样本个数
核函数下岭回归的解 为:
(11)
α
=
(
K
+
λ
I
)
−
1
y
\boldsymbol \alpha=(K+\lambda I)^{-1}\boldsymbol y\tag{11}
α = ( K + λ I ) − 1 y ( 1 1 ) 1.
α
\boldsymbol \alpha
α 为线性组合系数
α
i
\alpha_i
α i 组成的向量 2.
K
K
K 为核矩阵,其各个元素
K
i
,
j
K_{i,j}
K i , j 如前所述 3.
λ
\lambda
λ 为正则化系数 4.推导过程参考Kernel ridge Regression
定理 1. 给定循环数据
C
(
x
)
C(\boldsymbol x)
C ( x ) ,对于任意的变换矩阵
M
M
M ,如果核函数
K
\mathcal K
K 满足
K
(
x
,
x
′
)
=
K
(
M
x
,
M
x
′
)
\mathcal K(\boldsymbol x,\boldsymbol x')=\mathcal K(M\boldsymbol x,M\boldsymbol x')
K ( x , x ′ ) = K ( M x , M x ′ ) ,则核矩阵
K
K
K 是循环矩阵,证明如下:
K
i
j
=
K
(
x
,
x
′
)
=
K
(
P
i
x
,
P
j
x
)
\begin{aligned}K_{ij}&=\mathcal K(\boldsymbol x,\boldsymbol x')\\ &=\mathcal K({P^i \boldsymbol x},{P^j \boldsymbol x})\end{aligned}
K i j = K ( x , x ′ ) = K ( P i x , P j x )
K
(
M
x
,
M
x
′
)
=
K
(
P
−
i
P
i
x
,
P
−
i
P
j
x
)
=
K
(
x
,
P
j
−
i
x
)
=
K
(
x
,
P
(
j
−
i
)
%
n
x
)
=
K
i
j
\begin{aligned}\mathcal K(M\boldsymbol x,M\boldsymbol x')&=\mathcal K(P^{-i}P^i \boldsymbol x,{P^{-i}P^j \boldsymbol x})\\ &=\mathcal K(\boldsymbol x,P^{j-i}\boldsymbol x)\\ &=\mathcal K(\boldsymbol x,P^{(j-i) \% n}\boldsymbol x)=K_{ij} \end{aligned}
K ( M x , M x ′ ) = K ( P − i P i x , P − i P j x ) = K ( x , P j − i x ) = K ( x , P ( j − i ) % n x ) = K i j 1.由上式可看出,
K
i
,
j
K_{i,j}
K i , j 只依赖于
(
j
−
i
)
(j-i)
( j − i ) 和
n
n
n ,所以
K
K
K 为循环矩阵 2.
P
P
P 为置换矩阵,如:
P
=
[
0
0
0
.
.
.
1
1
0
0
.
.
.
0
0
1
0
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0
1
.
.
.
0
]
P=\begin{bmatrix} 0&0&0&...&1\\ 1&0&0&...&0\\ 0&1&0&...&0\\ ...&...&...&...&...\\ 0&0&1&...&0 \end{bmatrix}
P = ⎣ ⎢ ⎢ ⎢ ⎢ ⎡ 0 1 0 . . . 0 0 0 1 . . . 0 0 0 0 . . . 1 . . . . . . . . . . . . . . . 1 0 0 . . . 0 ⎦ ⎥ ⎥ ⎥ ⎥ ⎤
x
1
=
P
0
x
=
x
=
[
x
1
,
x
2
,
.
.
.
,
x
n
]
T
\boldsymbol x_1=P^0\boldsymbol x=\boldsymbol x=[x_1,x_2,...,x_n]^T
x 1 = P 0 x = x = [ x 1 , x 2 , . . . , x n ] T
x
2
=
P
1
x
=
[
x
n
,
x
1
,
.
.
.
,
x
n
−
1
]
T
\boldsymbol x_2=P^1\boldsymbol x=[x_n,x_1,...,x_{n-1}]^T
x 2 = P 1 x = [ x n , x 1 , . . . , x n − 1 ] T …
x
n
=
P
n
−
1
x
=
[
x
2
,
x
3
,
.
.
.
,
x
1
]
T
\boldsymbol x_n=P^{n-1}\boldsymbol x=[x_2,x_3,...,x_1]^T
x n = P n − 1 x = [ x 2 , x 3 , . . . , x 1 ] T
由
K
K
K 为循环矩阵,利用对角化性质,核滤波器表达式
(
11
)
(11)
( 1 1 ) 变换如下:
(12)
α
=
[
C
(
k
x
x
)
+
λ
I
]
−
1
y
=
[
F
d
i
a
g
(
k
^
x
x
)
F
H
+
λ
I
]
−
1
y
=
[
F
d
i
a
g
(
k
^
x
x
)
F
H
+
λ
F
I
F
H
]
−
1
y
=
[
F
d
i
a
g
(
k
^
x
x
)
F
H
+
F
d
i
a
g
(
λ
)
F
H
]
−
1
y
=
[
F
d
i
a
g
(
k
^
x
x
+
λ
)
F
H
]
−
1
y
=
(
F
H
)
−
1
(
d
i
a
g
(
k
^
x
x
+
λ
)
−
1
)
F
−
1
y
=
F
d
i
a
g
(
k
^
x
x
+
λ
)
−
1
F
H
y
\begin{aligned} \boldsymbol \alpha&=[C({\boldsymbol k ^{\boldsymbol x \boldsymbol x})}+\lambda I]^{-1} \boldsymbol y\\ &=[Fdiag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}})F^H+\lambda I]^{-1} \boldsymbol y\\ &=[Fdiag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}})F^H+\lambda FIF^H]^{-1} \boldsymbol y\\ &=[Fdiag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}})F^H+ Fdiag(\lambda)F^H]^{-1} \boldsymbol y\\ &=[Fdiag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda})F^H]^{-1} \boldsymbol y\\ &=(F^H)^{-1}(diag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda})^{-1})F^{-1} \boldsymbol y\\ &=Fdiag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda)}^{-1}F^H\boldsymbol y \tag{12} \end{aligned}
α = [ C ( k x x ) + λ I ] − 1 y = [ F d i a g ( k
x x ) F H + λ I ] − 1 y = [ F d i a g ( k
x x ) F H + λ F I F H ] − 1 y = [ F d i a g ( k
x x ) F H + F d i a g ( λ ) F H ] − 1 y = [ F d i a g ( k
x x + λ ) F H ] − 1 y = ( F H ) − 1 ( d i a g ( k
x x + λ ) − 1 ) F − 1 y = F d i a g ( k
x x + λ ) − 1 F H y ( 1 2 ) 1.
K
=
C
(
k
x
x
)
K=C({\boldsymbol k ^{\boldsymbol x \boldsymbol x}})
K = C ( k x x ) ,
k
x
x
{\boldsymbol k ^{\boldsymbol x \boldsymbol x}}
k x x 为矩阵
K
K
K 的第一行 2.由
F
F
H
=
I
FF^H=I
F F H = I
⇒
\Rightarrow
⇒
F
H
=
F
−
1
F^H=F^{-1}
F H = F − 1
对
(
12
)
(12)
( 1 2 ) 两边同时左乘
F
H
F^H
F H 得
F
H
α
=
F
H
F
d
i
a
g
(
k
^
x
x
+
λ
)
−
1
F
H
y
=
d
i
a
g
(
k
^
x
x
+
λ
)
−
1
F
H
y
\begin{aligned} F^H \boldsymbol \alpha&=F^HFdiag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda)}^{-1}F^H\boldsymbol y\\ &=diag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda)}^{-1}F^H\boldsymbol y \end{aligned}
F H α = F H F d i a g ( k
x x + λ ) − 1 F H y = d i a g ( k
x x + λ ) − 1 F H y
F
H
α
F^H \boldsymbol \alpha
F H α 表示
α
{\boldsymbol \alpha}
α 经傅里叶变换后的共轭转置,则
F
H
α
=
[
α
^
∗
]
T
F^H\boldsymbol \alpha=[\widehat{\boldsymbol \alpha}^*]^T
F H α = [ α
∗ ] T ,则上式可转换为:
[
α
^
∗
]
T
=
d
i
a
g
(
k
^
x
x
+
λ
)
−
1
F
H
y
=
d
i
a
g
(
1
k
^
x
x
+
λ
)
[
y
^
∗
]
T
=
[
d
i
a
g
(
1
k
^
x
x
+
λ
)
y
^
∗
]
T
\begin{aligned} [\widehat{\boldsymbol \alpha}^*]^T &= diag({\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda)}^{-1}F^H\boldsymbol y\\ &=diag(\frac{1}{{\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda}})[{\widehat \boldsymbol y^*}]^T\\ &=[diag(\frac{1}{{\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda}}){{\widehat \boldsymbol y^*}}]^T \end{aligned}
[ α
∗ ] T = d i a g ( k
x x + λ ) − 1 F H y = d i a g ( k
x x + λ 1 ) [ y
∗ ] T = [ d i a g ( k
x x + λ 1 ) y
∗ ] T
⇒
α
^
∗
=
d
i
a
g
(
1
k
^
x
x
+
λ
)
y
^
∗
\Rightarrow \widehat{\boldsymbol \alpha}^*=diag(\frac{1}{{\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda}}){{\widehat \boldsymbol y^*}}
⇒ α
∗ = d i a g ( k
x x + λ 1 ) y
∗ 1.
d
i
a
g
(
λ
)
−
1
=
d
i
a
g
(
1
λ
)
diag(\lambda)^{-1}=diag(\frac{1}{\lambda})
d i a g ( λ ) − 1 = d i a g ( λ 1 ) 2.
F
H
y
=
y
^
∗
F^H\boldsymbol y=\widehat \boldsymbol y^*
F H y = y
∗ 由于一个对角矩阵与一个向量相乘,相当于元素级乘法,因此:
α
^
∗
=
y
^
∗
k
^
x
x
+
λ
\widehat{\boldsymbol \alpha}^*=\frac{{{\widehat \boldsymbol y^*}}}{{\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda}}
α
∗ = k
x x + λ y
∗ 等式两边同时取共轭,得:
(13)
α
^
=
y
^
k
^
x
x
+
λ
\widehat{\boldsymbol \alpha}=\frac{{{\widehat \boldsymbol y}}}{{\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}+\lambda}}\tag{13}
α
= k
x x + λ y
( 1 3 ) 更一般地,矩阵
K
K
K 中每一行
K
i
x
x
′
=
K
(
x
′
,
P
i
−
1
x
)
K_i^{\boldsymbol x \boldsymbol x'}=\mathcal K(\boldsymbol x',P^{i-1}\boldsymbol x)
K i x x ′ = K ( x ′ , P i − 1 x ) 1.
i
i
i 为第
K
K
K 的第
i
i
i 行,即在基样本上进行
i
−
1
i-1
i − 1 次循环移位 2.
x
x
x 表示基样本,即
K
K
K 的第一行 3.
k
^
x
x
{\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol x}}
k
x x 表示
x
x
x 在傅里叶域进行自相关
5 快速检测
由式
(
11
)
(11)
( 1 1 ) 可得:
(14)
f
(
z
)
=
K
α
f(\boldsymbol z) =K {\boldsymbol \alpha}\tag{14}
f ( z ) = K α ( 1 4 ) 1.正则化仅用来求解
α
\boldsymbol \alpha
α ,在回归时不需要正则项 2.
z
{\boldsymbol z}
z 表示待检测的图像块
在实用场景下,
(
14
)
(14)
( 1 4 ) 可表示为:
(15)
f
(
z
)
=
(
K
z
)
T
α
f(\boldsymbol z) =(K^{\boldsymbol z})^T \boldsymbol \alpha \tag{15}
f ( z ) = ( K z ) T α ( 1 5 ) 1.
α
\boldsymbol \alpha
α 为训练好的分类器参数 2.
K
z
=
C
(
k
x
z
)
=
K
(
P
i
−
1
z
,
P
j
−
1
x
)
K^{\boldsymbol z}=C({\boldsymbol k ^{\boldsymbol x \boldsymbol z}})=\mathcal K(P^{i-1} \boldsymbol z,P^{j-1} \boldsymbol x)
K z = C ( k x z ) = K ( P i − 1 z , P j − 1 x ) ,表示训练样本和待检测样本之间的核矩阵 ,是一个非对称矩阵 3.
x
\boldsymbol x
x 表示待训练基样本,
z
\boldsymbol z
z 表示待检测基样本
继续
(
15
)
(15)
( 1 5 ) 式推导
(16)
f
(
z
)
=
[
C
(
k
x
z
)
]
T
α
=
F
d
i
a
g
(
(
k
^
x
z
)
∗
)
F
H
α
=
C
(
(
k
x
z
)
∗
)
α
\begin{aligned} f(\boldsymbol z) &=[C({\boldsymbol k ^{\boldsymbol x \boldsymbol z}})]^T \boldsymbol \alpha\\ &=Fdiag({ (\widehat\boldsymbol k ^{\boldsymbol x \boldsymbol z}})^*)F^H\boldsymbol \alpha\\ &=C(({\boldsymbol k ^{\boldsymbol x \boldsymbol z}})^*)\boldsymbol \alpha \tag{16} \end{aligned}
f ( z ) = [ C ( k x z ) ] T α = F d i a g ( ( k
x z ) ∗ ) F H α = C ( ( k x z ) ∗ ) α ( 1 6 ) 注:上述推导利用了循环矩阵的转置性质,即转置后的特征值与原特征值互为共轭 ,用公式表达为:
X
T
=
F
d
i
a
g
(
(
x
^
)
∗
)
F
H
X^T=Fdiag((\widehat\boldsymbol x)^*)F^H
X T = F d i a g ( ( x
) ∗ ) F H 其中矩阵
d
i
a
g
(
(
x
^
)
∗
)
diag((\widehat\boldsymbol x)^*)
d i a g ( ( x
) ∗ ) 对角线上的值为矩阵
X
T
X^T
X T 的特征值
对
(
16
)
(16)
( 1 6 ) 式两边同时进行傅里叶变换,得:
F
(
f
(
z
)
)
=
F
(
C
(
(
k
x
z
)
∗
)
α
)
\mathcal F(f(\boldsymbol z))=\mathcal F(C(({\boldsymbol k ^{\boldsymbol x \boldsymbol z}})^*)\boldsymbol \alpha)
F ( f ( z ) ) = F ( C ( ( k x z ) ∗ ) α ) 利用循环卷积性质(如式
(
8
)
(8)
( 8 ) 所示),得:
F
(
f
(
z
)
)
=
F
∗
(
(
k
x
z
)
∗
)
⊙
F
(
α
)
=
F
(
k
x
z
)
⊙
F
(
α
)
\begin{aligned} \mathcal F(f(\boldsymbol z))&=\mathcal F ^ *(({\boldsymbol k ^{\boldsymbol x \boldsymbol z}})^*) \odot \mathcal F(\boldsymbol \alpha)\\ &=\mathcal F ({\boldsymbol k ^{\boldsymbol x \boldsymbol z}}) \odot \mathcal F(\boldsymbol \alpha) \end{aligned}
F ( f ( z ) ) = F ∗ ( ( k x z ) ∗ ) ⊙ F ( α ) = F ( k x z ) ⊙ F ( α ) 即:
(17)
f
^
(
z
)
=
k
^
x
z
⊙
α
^
\widehat f(\boldsymbol z)={\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol z}} \odot \widehat \boldsymbol \alpha \tag{17}
f
( z ) = k
x z ⊙ α
( 1 7 ) 1.
k
x
z
{ \boldsymbol k ^{\boldsymbol x \boldsymbol z}}
k x z 为核矩阵
K
K
K 的第一行 2.
k
^
x
z
{\widehat \boldsymbol k ^{\boldsymbol x \boldsymbol z}}
k
x z 为训练样本
x
x
x 与待测样本
z
z
z 在傅里叶域的核相关
6 快速核相关
6.1 点积与多项式核
对于某种点积核函数
g
g
g ,其核函数可表示为:
K
(
x
,
x
′
)
=
g
(
x
T
x
′
)
\mathcal K(\boldsymbol x,\boldsymbol x')=g(\boldsymbol x^T\boldsymbol x')
K ( x , x ′ ) = g ( x T x ′ ) 注:
g
g
g 表示输入向量间的元素级操作
k
i
x
x
′
=
K
(
x
′
,
P
i
−
1
x
)
=
g
(
x
′
T
P
i
−
1
x
)
{k_i^{\boldsymbol x\boldsymbol x'}}=\mathcal K(\boldsymbol x',P^{i-1}\boldsymbol x)=g(\boldsymbol {x'}^TP^{i-1}\boldsymbol x)
k i x x ′ = K ( x ′ , P i − 1 x ) = g ( x ′ T P i − 1 x )
k
x
x
′
=
g
(
C
(
x
)
x
′
)
(
证
明
略
)
k^{\boldsymbol x \boldsymbol x'}=g(C(\boldsymbol x)\boldsymbol x') (证明略)
k x x ′ = g ( C ( x ) x ′ ) ( 证 明 略 ) 由循环矩阵性质可知:
F
(
C
(
x
)
x
′
)
=
x
^
∗
⊙
x
′
\mathcal F(C(\boldsymbol x)\boldsymbol x')=\widehat \boldsymbol x^* \odot \boldsymbol x'
F ( C ( x ) x ′ ) = x
∗ ⊙ x ′
⇒
C
(
x
)
x
′
=
F
−
1
(
x
^
∗
⊙
x
^
′
)
\Rightarrow C(\boldsymbol x)\boldsymbol x' = \mathcal F^{-1}(\widehat \boldsymbol x^* \odot \widehat \boldsymbol x')
⇒ C ( x ) x ′ = F − 1 ( x
∗ ⊙ x
′ )
⇒
k
x
x
′
=
g
(
F
−
1
(
x
^
∗
⊙
x
^
′
)
)
\Rightarrow k^{\boldsymbol x \boldsymbol x'}=g(\mathcal F^{-1}(\widehat \boldsymbol x^* \odot \widehat \boldsymbol x'))
⇒ k x x ′ = g ( F − 1 ( x
∗ ⊙ x
′ ) )
特殊地,对于多项式核
K
(
x
,
x
′
)
=
(
x
T
x
′
+
a
)
b
\mathcal K(\boldsymbol x,\boldsymbol x')=(\boldsymbol x^T\boldsymbol x'+a)^b
K ( x , x ′ ) = ( x T x ′ + a ) b
(18)
⇒
k
x
x
′
=
(
F
−
1
(
x
^
∗
⊙
x
^
′
)
+
a
)
b
\Rightarrow k^{\boldsymbol x \boldsymbol x'}=(\mathcal F^{-1}(\widehat \boldsymbol x^* \odot \widehat \boldsymbol x')+a)^b\tag{18}
⇒ k x x ′ = ( F − 1 ( x
∗ ⊙ x
′ ) + a ) b ( 1 8 )
6.2 径向基函数与高斯核
对于某种径向基函数
h
h
h ,其核函数可表示为:
K
(
x
,
x
′
)
=
h
(
∣
∣
x
−
x
′
∣
∣
2
)
\mathcal K(\boldsymbol x,\boldsymbol x')=h(||\boldsymbol x-\boldsymbol x'||^2)
K ( x , x ′ ) = h ( ∣ ∣ x − x ′ ∣ ∣ 2 )
k
i
x
x
′
=
K
(
x
′
,
P
i
−
1
x
)
=
h
(
∣
∣
x
′
−
P
i
−
1
x
∣
∣
2
)
=
h
(
∣
∣
x
′
∣
∣
2
+
∣
∣
P
i
−
1
x
∣
∣
2
−
2
x
′
T
P
i
−
1
x
)
=
h
(
∣
∣
x
′
∣
∣
2
+
∣
∣
x
∣
∣
2
−
2
x
′
T
P
i
−
1
x
)
=
h
(
∣
∣
x
′
∣
∣
2
+
∣
∣
x
∣
∣
2
−
2
F
−
1
(
x
^
∗
⊙
x
^
′
)
)
\begin{aligned} k_i^{\boldsymbol x \boldsymbol x'}=\mathcal K(\boldsymbol x',P^{i-1}\boldsymbol x)&=h(||\boldsymbol x'-P^{i-1}\boldsymbol x||^2)\\ &=h(||\boldsymbol x'||^2+||P^{i-1}\boldsymbol x||^2-2\boldsymbol x'^TP^{i-1}\boldsymbol x)\\ &=h(||\boldsymbol x'||^2+||\boldsymbol x||^2-2\boldsymbol x'^TP^{i-1}\boldsymbol x)\\ &=h(||\boldsymbol x'||^2+||\boldsymbol x||^2-2\mathcal F^{-1}(\widehat \boldsymbol x^*\odot\widehat\boldsymbol x')) \end{aligned}
k i x x ′ = K ( x ′ , P i − 1 x ) = h ( ∣ ∣ x ′ − P i − 1 x ∣ ∣ 2 ) = h ( ∣ ∣ x ′ ∣ ∣ 2 + ∣ ∣ P i − 1 x ∣ ∣ 2 − 2 x ′ T P i − 1 x ) = h ( ∣ ∣ x ′ ∣ ∣ 2 + ∣ ∣ x ∣ ∣ 2 − 2 x ′ T P i − 1 x ) = h ( ∣ ∣ x ′ ∣ ∣ 2 + ∣ ∣ x ∣ ∣ 2 − 2 F − 1 ( x
∗ ⊙ x
′ ) ) 注:上式转换中去除了转换矩阵
P
i
−
1
P^{i-1}
P i − 1 ,因为矩阵的循环移位不影响其范数
特殊地,对于高斯核
K
(
x
,
x
′
)
=
e
x
p
(
−
1
σ
2
∣
∣
x
−
x
′
∣
∣
2
)
\mathcal K(\boldsymbol x,\boldsymbol x')=exp(-\frac{1}{\sigma^2}||\boldsymbol x-\boldsymbol x'||^2)
K ( x , x ′ ) = e x p ( − σ 2 1 ∣ ∣ x − x ′ ∣ ∣ 2 )
(19)
⇒
k
x
x
′
=
e
x
p
(
−
1
σ
2
(
∣
∣
x
′
∣
∣
2
+
∣
∣
x
∣
∣
2
−
2
F
−
1
(
x
^
∗
⊙
x
^
′
)
)
)
\Rightarrow k^{\boldsymbol x \boldsymbol x'}=exp(-\frac{1}{\sigma^2}(||\boldsymbol x'||^2+||\boldsymbol x||^2-2\mathcal F^{-1}(\widehat \boldsymbol x^*\odot\widehat\boldsymbol x')))\tag{19}
⇒ k x x ′ = e x p ( − σ 2 1 ( ∣ ∣ x ′ ∣ ∣ 2 + ∣ ∣ x ∣ ∣ 2 − 2 F − 1 ( x
∗ ⊙ x
′ ) ) ) ( 1 9 )
7 伪代码
function alphaf = train ( x, y, sigma, lambda)
k = kernel_correlation ( x, x, sigma) ;
alphaf = fft2 ( y) . / ( fft2 ( k) + lambda) ;
end
function responses = detect ( alphaf, x, z, sigma)
k = kernel_correlation ( z, x, sigma) ;
responses = real ( ifft2 ( alphaf . * fft2 ( k) ) ) ;
end
function k = kernel_correlation ( x1, x2, sigma)
c = ifft2 ( sum ( conj ( fft2 ( x1) ) . * fft2 ( x2) , 3 ) ) ;
d = x1 ( : ) ’* x1 ( : ) + x2 ( : ) ’* x2 ( : ) - 2 * c;
k = exp ( - 1 / sigma^ 2 * abs ( d) / numel ( d) ) ;
end