CS131学习笔记#1
此篇是根据CS131(2020fall) assignment one 总结的学习笔记。
1 卷积(Convolutions)
1.1 卷积定义
在数学上,卷积的定义为:
( f ∗ g ) ( n ) = ∫ − ∞ ∞ f ( τ ) ⋅ g ( n − τ ) d τ (f*g)(n)=\int_{ -\infty}^{\infty} f(\tau)\cdot g(n-\tau)d\tau (f∗g)(n)=∫−∞∞f(τ)⋅g(n−τ)dτ
在图片处理上,我们要处理的往往是二维矩阵,这时对象是离散的,于是卷积有如下形式
( f ∗ h ) [ m , n ] = ∑ i = − ∞ ∞ ∑ j = − ∞ ∞ f [ i , j ] ⋅ h [ m − i , n − j ] (f*h)[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f[i,j]\cdot h[m-i,n-j] (f∗h)[m,n]=i=−∞∑∞j=−∞∑∞f[i,j]⋅h[m−i,n−j]
1.2 卷积的运用
卷积在图像处理中的运用以及原理可以参考高老师的微信文章,此处不再赘述:
https://mp.weixin.qq.com/s/oHwKTE3CF7RsHL6JOLgEaA
1.3 卷积的代码实现
接下来我们进行一些实现:
我们首先导入了一张灰度图:
# Open image as grayscale
img = io.imread('dog.jpg', as_gray=True)
# Show image
plt.imshow(img)
plt.axis('off')
plt.title("Isn't he cute?")
plt.show()
)
1.3.1 基本卷积函数
之后我们编写一个函数(conv_nested
)来实现卷积操作:
def conv_nested(image, kernel):
"""利用输入的image和卷积核kernel完成卷积操作,要求输出图像和原始图像大小相同
Args:
image: numpy array of shape (Hi, Wi).
kernel: numpy array of shape (Hk, Wk).
Returns:
out: numpy array of shape (Hi, Wi).
"""
Hi, Wi = image.shape#图像的高度宽度分别为:Hi、Wi
Hk, Wk = kernel.shape#卷积核的高度宽度分别为:Hk,Wk
#因为要求输出图片大小不变,但是直接进行卷积会使图片变小,我们应当采取一些策略
#策略一:扩展image,在image的边缘添加0,使得卷积核中心刚好可以对应到image的原始边缘。
out = np.zeros((Hi, Wi))
mtr= np.zeros((Hi+Hk-1,Wi+Wk-1))#添加了0边缘的矩阵
for i in range (Hi+Hk-1):
for j in range (Wi+Wk-1):
for m in range (Hk):
for n in range (Wk):
if ((i-m)>=0 and (i-m)<Hi and (j-n)>=0 and (j-n)<Wi):
#图像上待卷积操作方块和卷积核方块中对应相乘的点的坐标应当以待卷积操作方块中心点坐标中心对称
mtr[i][j] += image[i-m][j-n] * kernel[m][n]
for i in range(Hi):
for j in range(Wi):
out[i][j] = mtr[int(i+(Hk-1)/2)][int(j+(Wk-1)/2)]#取出我们需要的矩阵
#策略二:不扩展image,改变卷积核的坐标定位方式。
Hm = Hk // 2
Wm = Wk // 2
for i in range (Hi):
for j in range (Wi):
for m in range (-Hm,Hm+1):
for n in range (-Wm,Wm+1):
if ((i-m)>=0 and (i-m)<Hi and (j-n)>=0 and (j-n)<Wi):
out[i][j] += image[i-m][j-n] * kernel[m][n]
return out
调用此函数我们可以得到卷积操作后的图像:
from filters import conv_nested
# Simple convolution kernel.
# Feel free to change the kernel to see different outputs.
kernel = np.array(
[
[1,0,-1],
[2,0,-2],
[1,0,-1]
])
out = conv_nested(img, kernel)#调用卷积操作函数
# Plot original image
plt.subplot(2,2,2)
plt.imshow(img)
plt.title('Original')
plt.axis('off')
# Plot convolved image
plt.subplot(2,2,1)
plt.imshow(out)
plt.title('Convolution')
plt.axis('off')
plt.show()
)
1.3.2 加速卷积函数
我们发现,在1.3.1的普通卷积函数中,我们使用了四重for循环,使得卷积操作的复杂度大大提升,为了加快卷积速度,我们参考着卷积核在图像上滑动(sliding window)这一想法中心编写了加速卷积函数。
首先是zero_pad函数为图像增加边缘:
def zero_pad(image, pad_height, pad_width):
""" Zero-pad an image.
Ex: a 1x1 image [[1]] with pad_height = 1, pad_width = 2 becomes:
[[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0]] of shape (3, 5)
Args:
image: numpy array of shape (H, W).
pad_width: width of the zero padding (left and right padding).
pad_height: height of the zero padding (bottom and top padding).
Returns:
out: numpy array of shape (H+2*pad_height, W+2*pad_width).
"""
H, W = image.shape
out = None
### YOUR CODE HERE
out=np.zeros((H+2*pad_height,W+2*pad_width))
out[pad_height:H+pad_height,pad_width:W+pad_width]=image
### END YOUR CODE
return out
之后是conv_fast函数进行加速卷积操作:
def conv_fast(image, kernel):
""" An efficient implementation of convolution filter.
This function uses element-wise multiplication and np.sum()
to efficiently compute weighted sum of neighborhood at each
pixel.
Hints:
- Use the zero_pad function you implemented above
- There should be two nested for-loops
- You may find np.flip() and np.sum() useful
Args:
image: numpy array of shape (Hi, Wi).
kernel: numpy array of shape (Hk, Wk). Dimensions will be odd.
Returns:
out: numpy array of shape (Hi, Wi).
"""
Hi, Wi = image.shape
Hk, Wk = kernel.shape
out = np.zeros((Hi, Wi))
image = zero_pad(image,Hk//2,Wk//2 )
kernel = np.flip(kernel, 0)#矩阵上下翻转
kernel = np.flip(kernel, 1)#矩阵左右翻转
for m in range (Hi):
for n in range (Wi):
out[m][n]=np.sum(image[m:m+Hk,n:n+Wk]*kernel)
#注意numpy中矩阵乘法特性:
#星乘表示矩阵内各对应位置相乘,矩阵a*b下标(0,0)=矩阵a下标(0,0) x 矩阵b下标(0,0);
#点乘表示求矩阵内积,二维数组称为矩阵积(mastrix product)。
return out
接下来实际调用:
from filters import conv_fast
t0 = time()
out_fast = conv_fast(img, kernel)
t1 = time()
out_nested = conv_nested(img, kernel)
t2 = time()
# Compare the running time of the two implementations
print("conv_nested: took %f seconds." % (t2 - t1))
print("conv_fast: took %f seconds." % (t1 - t0))
# Plot conv_nested output
plt.subplot(1,2,1)
plt.imshow(out_nested)
plt.title('conv_nested')
plt.axis('off')
# Plot conv_fast output
plt.subplot(1,2,2)
plt.imshow(out_fast)
plt.title('conv_fast')
plt.axis('off')
# Make sure that the two outputs are the same
if not (np.max(out_fast - out_nested) < 1e-10):
print("Different outputs! Check your implementation.")
conv_nested: took 1.054362 seconds.
conv_fast: took 0.585668 seconds.
#我们发现新的卷积操作确实快上不少
2 互相关(Cross-correlation)
互相关在定义上和卷积非常相似:
( g ∗ ∗ f ) [ m , n ] = ∑ i = − ∞ ∞ ∑ j = − ∞ ∞ g [ i , j ] ⋅ f [ m + i , n + j ] (g ** f)[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty g[i,j]\cdot f[m + i,n + j] (g∗∗f)[m,n]=i=−∞∑∞j=−∞∑∞g[i,j]⋅f[m+i,n+j]
互相关的本质其实是内积,我们知道内积其实代表一个向量在另一个向量上的投影,因此内积越大代表投影越大,代表两个向量越相似。因而互相关在图像处理中的应用就是检测两个图像的相似性。
2.1 互相关应用实例
CS131中为我们提供了一个生动的例子:寻找货架上的商品
首先我们得到一个货架:
然后我们要在这个货架上寻找目标商品:
目测发现他在货架左上角第二个,接下来我们编写互相关函数来尝试这种方式是否可行。
2.2 互相关函数实现
仔细观察一下我们发现,互相关的计算其实就是不用进行翻转操作的卷积计算,于是我们可以直接利用已有函数,将翻转的卷积核再翻转回来:
def cross_correlation(f, g):
"""
Args:
f: numpy array of shape (Hf, Wf).
g: numpy array of shape (Hg, Wg).
Returns:
out: numpy array of shape (Hf, Wf).
"""
out = None
g=np.flip(g,1)
g=np.flip(g,0)#把卷积核转回来
out = conv_fast(f,g)
return out
然后稍微运行一下:
from filters import cross_correlation
# Load template and image in grayscale
img = io.imread('shelf.jpg')
img_gray = io.imread('shelf.jpg', as_gray=True)
temp = io.imread('template.jpg')
temp_gray = io.imread('template.jpg', as_gray=True)
# Perform cross-correlation between the image and the template
out = cross_correlation(img_gray, temp_gray)
# Find the location with maximum similarity
y,x = (np.unravel_index(out.argmax(), out.shape))
#返回矩阵中最大值的坐标
# Display cross-correlation output
plt.subplot(3, 1, 2)
plt.imshow(out)
plt.title('Cross-correlation (white means more correlated)')
plt.axis('off')
# Display image
plt.subplot(3, 1, 3)
plt.imshow(img)
plt.title('Result (blue marker on the detected location)')
plt.axis('off')
# Draw marker at detected location
plt.plot(x, y, 'bx', ms=40, mew=10)
plt.show()
结果是这样的:
诶我们发现这个叉标在了一个奇怪的地方,并不是我们想要的位置。
这是为什么?
回到最基本的判断方法,我们认为互相关计算值越大表示越相似,但是由于我们的图像矩阵中没有负值,所以和实际的向量计算有所不同,实际向量计算不相似时,会有正负值抵消来减小互相关值,而当前情况下,只要该区域内数值大就一定会得到较高的相关值,我们知道在一个灰度图矩阵中,灰度用0-255间的数字表示,数字越大越白,因此叉最后落在了一个比较白的盒子上面,这就是为什么得到了图示结果。
2.3 0均值互相关
Zero-mean cross-correlation 是我们用来解决上述问题的方法,为了让我们的情境更接近于向量,我们要引入负值,将矩阵的均值变为0是一种不错的引入方法,有机会产生一个绝对值比较大的负数来抵消正值的乘积。
def zero_mean_cross_correlation(f, g):
out = None
g=g-np.mean(g)
out=cross_correlation(f,g)
return out
结果如上,准确定位了属于是。
2.4 标准化互相关
然而上面的方法鲁棒性特别差,比如光线暗的时候,就会像下图一样定位异常。
于是采用Normalized Cross-correlation来解决这一问题。
normalized cross-correlation of f f f and template g g g 的数学定义如下:
( g ⋆ f ) [ m , n ] = ∑ i , j g [ i , j ] − g ‾ σ g ⋅ f [ m + i , n + j ] − f m , n ‾ σ f m , n (g \star f)[m,n]=\sum_{i,j} \frac{g[i, j]-\overline{g}}{\sigma_g} \cdot \frac{f[m + i, n + j]-\overline{f_{m,n}}}{\sigma_{f_{m,n}}} (g⋆f)[m,n]=i,j∑σgg[i,j]−g⋅σfm,nf[m+i,n+j]−fm,n
where:
- f m , n f_{m,n} fm,n is the patch image at position ( m , n ) (m,n) (m,n)
- f m , n ‾ \overline{f_{m,n}} fm,n is the mean of the patch image f m , n f_{m,n} fm,n
- σ f m , n \sigma_{f_{m,n}} σfm,n is the standard deviation of the patch image f m , n f_{m,n} fm,n
- g ‾ \overline{g} g is the mean of the template g g g
- σ g \sigma_g σg is the standard deviation of the template g g g
其中patch image表示原图片中与kernel做cross-correlation的小方块,进行normalized操作的目的是:通过中心化和标准化处理,得到均值为0,标准差为1的服从标准正态分布的数据,这样再来计算cross-correlation得到的结果鲁棒性就比较好了。
3 可分离滤波器
Separable Filters就是把filter拆分成矩阵的乘积,
如果: F = F 1 F 2 F=F_1F_2 F=F1F2,那么有:
( I ∗ F ) [ m , n ] = ∑ i = − ∞ ∞ ∑ j = − ∞ ∞ I [ i , j ] ⋅ F [ m − i , n − j ] = ∑ i = − ∞ ∞ ∑ j = − ∞ ∞ I [ i , j ] ⋅ F 1 [ m − i ] ⋅ F 2 [ n − j ] = ∑ j = − ∞ ∞ F 2 [ n − j ] ∑ i = − ∞ ∞ I [ i , j ] ⋅ F 1 [ m − i ] = ∑ j = − ∞ ∞ F 2 [ n − j ] ⋅ ( I ∗ F 1 ) = ( I ∗ F 1 ) ∗ F 2 \begin{aligned} (I*F)[m,n] &= \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty I[i,j]\cdot F[m-i,n-j]\\ &= \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty I[i,j]\cdot F_1[m-i]\cdot F_2[n-j]\\ &= \sum_{j=-\infty}^\infty F_2[n-j]\sum_{i=-\infty}^\infty I[i,j]\cdot F_1[m-i]\\ &= \sum_{j=-\infty}^\infty F_2[n-j]\cdot(I*F_1)\\ &= (I*F_1)*F_2 \end{aligned} (I∗F)[m,n]=i=−∞∑∞j=−∞∑∞I[i,j]⋅F[m−i,n−j]=i=−∞∑∞j=−∞∑∞I[i,j]⋅F1[m−i]⋅F2[n−j]=j=−∞∑∞F2[n−j]i=−∞∑∞I[i,j]⋅F1[m−i]=j=−∞∑∞F2[n−j]⋅(I∗F1)=(I∗F1)∗F2
这样一番操作,复杂度明显变小了。
Direct 2D convolution O ( M 1 N 1 M 2 N 2 ) O(M_1N_1M_2N_2) O(M1N1M2N2), and two successive 1D convolutions O ( M 1 N 1 ( M 2 + N 2 ) ) O(M_1N_1(M_2+N_2)) O(M1N1(M2+N2)) .Generally in a picture,the M 2 ∗ N 2 M_2*N_2 M2∗N2 is bigger than M 2 + N 2 M_2+N_2 M2+N2. So two successive 1D convolutions is more efficient.