Article Directory
This blog also has multiple super-detailed overviews, and interested friends can move to:
Convolutional Neural Networks: A Super Detailed Introduction to Convolutional Neural Networks
Object Detection: Object Detection Super Detailed Introduction
Semantic Segmentation: A Super Detailed Introduction to Semantic Segmentation
NMS: Let you understand and see the whole NMS and its variants
Data Augmentation: An article to understand data augmentation in computer vision
Loss function: loss function and evaluation index in classification detection segmentation
Transformer:A Survey of Visual Transformers
Machine Learning Practical Series: Decision Trees
YOLO series:v1、v2、v3、v4、scaled-v4、v5、v6、v7、yolof、yolox、yolos、yolop
1 Basic image manipulation and processing
1.1 PIL: Python image processing library
PIL (Python Imaging Library, image processing library) provides general image processing functions, as well as a large number of useful basic image operations. The PIL library has been integrated in the Anaconda library. It is recommended to use Anaconda. It is simple and convenient, and the commonly used libraries have been integrated.
- Read in an image:
from PIL import Image
from pylab import *
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
figure()
pil_im = Image.open('E:\python\Python Computer Vision\Image data\empire.jpg')
gray()
subplot(121)
title(u'原图',fontproperties=font)
axis('off')
imshow(pil_im)
pil_im = Image.open('E:\python\Python Computer Vision\Image data\empire.jpg').convert('L')
subplot(122)
title(u'灰度图',fontproperties=font)
axis('off')
imshow(pil_im)
show()
1.1.1 Convert image format - save()
function
from PCV.tools.imtools import get_imlist #导入原书的PCV模块
from PIL import Image
import os
import pickle
filelist = get_imlist('E:/python/Python Computer Vision/test jpg/') #获取convert_images_format_test文件夹下的图片文件名(包括后缀名)
imlist = open('E:/python/Python Computer Vision/test jpg/imlist.txt','wb+')
#将获取的图片文件列表保存到imlist.txt中
pickle.dump(filelist,imlist) #序列化
imlist.close()
for infile in filelist:
outfile = os.path.splitext(infile)[0] + ".png" #分离文件名与扩展名
if infile != outfile:
try:
Image.open(infile).save(outfile)
except IOError:
print ("cannot convert", infile)
Among them, the test jpg folder is a folder created by the author to store the tested **.jpg images. Some codes are added to the source code certificate to save the obtained image file names and convert all images to .png format. , the result after running the program is as follows:
<img src="https://img-blog.csdn.net/20180306084511595?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvamlhb3lhbmd3bQ==/font/5a6L5L2T/fontsize/400/fill/I0 JBQkFCMA== /dissolve/70"width="47.5%",alt=""/>
The open() function in PIL is used to create a PIL image object, and the sace() method is used to save the following to a folder with a specified file name. The above process changes the suffix to .png, but the file name remains unchanged.
1.1.2 Create thumbnails
Using PIL, you can easily create a thumbnail, set the size of the thumbnail, save it in a tuple, and call the thumnail()
method to generate the thumbnail. The code to create the thumbnail is below.
For example to create a thumbnail with a longest side of 128 pixels, you can use:
pil_im.thumbnail((128,128))
####1.1.3 Copy and paste the image area
Call the crop() method to copy the area from an image. After copying the area, you can perform rotation and other transformations on the area.
box=(100,100,400,400)
region=pil_im.crop(box)
The target area is specified by a quadruple, and the coordinates are (left, top, right, bottom). The coordinates of the upper left corner of the specified coordinate system in PIL are (0, 0), which can be rotated and put back with paste(). The specific implementation as follows:
region=region.transpose(Image.ROTATE_180)
pil_im.paste(region,box)
1.1.4 Resizing and Rotating
- Resize: Utilize
resize()
the method, the parameter is a tuple specifying the size of the new image:
out=pil_im.resize((128,128))
- Rotation: use
rotate()
the method to represent the angle counterclockwise
out=pil_im.rotate(45)
The code for the above operation is as follows:
from PIL import Image
from pylab import *
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
figure()
# 显示原图
pil_im = Image.open('E:/python/Python Computer Vision/Image data/empire.jpg')
print(pil_im.mode, pil_im.size, pil_im.format)
subplot(231)
title(u'原图', fontproperties=font)
axis('off')
imshow(pil_im)
# 显示灰度图
pil_im = Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L')
gray()
subplot(232)
title(u'灰度图', fontproperties=font)
axis('off')
imshow(pil_im)
# 复制并粘贴区域
pil_im = Image.open('E:/python/Python Computer Vision/Image data/empire.jpg')
box = (100, 100, 400, 400)
region = pil_im.crop(box)
region = region.transpose(Image.ROTATE_180)
pil_im.paste(region, box)
subplot(233)
title(u'复制粘贴区域', fontproperties=font)
axis('off')
imshow(pil_im)
# 缩略图
pil_im = Image.open('E:/python/Python Computer Vision/Image data/empire.jpg')
size = 128, 128
pil_im.thumbnail(size)
print(pil_im.size)
subplot(234)
title(u'缩略图', fontproperties=font)
axis('off')
imshow(pil_im)
pil_im.save('E:/python/Python Computer Vision/Image data/empire thumbnail.jpg')# 保存缩略图
#调整图像尺寸
pil_im=Image.open('E:/python/Python Computer Vision/Image data/empire thumbnail.jpg')
pil_im=pil_im.resize(size)
print(pil_im.size)
subplot(235)
title(u'调整尺寸后的图像',fontproperties=font)
axis('off')
imshow(pil_im)
#旋转图像45°
pil_im=Image.open('E:/python/Python Computer Vision/Image data/empire thumbnail.jpg')
pil_im=pil_im.rotate(45)
subplot(236)
title(u'旋转45°后的图像',fontproperties=font)
axis('off')
imshow(pil_im)
show()
The result of the operation is as follows:
1.2 Matplotlib library
When dealing with mathematics and graphics or plotting points, drawing lines, and curves on images, Matplotlib is a good graphics library that provides more powerful features than the PIL library.
1.2.1 Drawing, Plotting Points and Lines
from PIL import Image
from pylab import *
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
# 读取图像到数组中
im = array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg'))
figure()
# 绘制有坐标轴的
subplot(121)
imshow(im)
x = [100, 100, 400, 400]
y = [200, 500, 200, 500]
# 使用红色星状标记绘制点
plot(x, y, 'r*')
# 绘制连接两个点的线(默认为蓝色)
plot(x[:2], y[:2])
title(u'绘制empire.jpg', fontproperties=font)
# 不显示坐标轴的
subplot(122)
imshow(im)
x = [100, 100, 400, 400]
y = [200, 500, 200, 500]
plot(x, y, 'r*')
plot(x[:2], y[:2])
axis('off')
title(u'绘制empire.jpg', fontproperties=font)
show()
# show()命令首先打开图形用户界面(GUI),然后新建一个窗口,该图形用户界面会循环阻断脚本,然后暂停,
# 直到最后一个图像窗口关闭。每个脚本里,只能调用一次show()命令,通常相似脚本的结尾调用。
There are many optional colors and styles when drawing, as shown in Table 1-1, 1-2, 1-3, and the application routine is as follows:
plot(x,y) #默认为蓝色实线
plot(x,y,'go-') #带有圆圈标记的绿线
plot(x,y,'ks:') #带有正方形标记的黑色虚线
symbol | color |
---|---|
‘b’ | blue |
‘g’ | green |
‘r’ | red |
‘c’ | blue |
‘m’ | magenta |
‘y’ | yellow |
‘k’ | black |
‘w’ | White |
1.2.2 Image contour and histogram
from PIL import Image
from pylab import *
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
# 打开图像,并转成灰度图像
im = array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L'))
# 新建一个图像
figure()
subplot(121)
# 不使用颜色信息
gray()
# 在原点的左上角显示轮廓图像
contour(im, origin='image')
axis('equal')
axis('off')
title(u'图像轮廓图', fontproperties=font)
subplot(122)
# 利用hist来绘制直方图
# 第一个参数为一个一维数组
# 因为hist只接受一维数组作为输入,所以要用flatten()方法将任意数组按照行优先准则转化成一个一维数组
# 第二个参数指定bin的个数
hist(im.flatten(), 128)
title(u'图像直方图', fontproperties=font)
# plt.xlim([0,250])
# plt.ylim([0,12000])
show()
1.2.3 Interactive annotation
Sometimes users need to interact with the application, such as marking images with dots, or annotating some training data. PyLab provides a very simple and easy-to-use function gitput() to achieve interactive annotation.
from PIL import Image
from pylab import *
im = array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg'))
imshow(im)
print('Please click 3 points')
x = ginput(3)
print('you clicked:', x)
show()
output:
you clicked:
[(118.4632306896458, 177.58271393177051),
(118.4632306896458, 177.58271393177051),
(118.4632306896458, 177.58271393177051)]
The above code first reads the empire.jpg image, displays the read image, and then uses ginput() to interactively annotate. The interactive annotation data points set here are set to 3. After the user annotates, the coordinates of the annotation points will be printed out.
1.3 NumPy library
NumPy Online Documentation
NumPy is a popular Python package for scientific computing. It contains many other very useful objects such as vectors, matrices, images, and linear algebra functions.
1.3.1 Image Array Representation
In the previous image example, we converted the image to a NumPy array object using the array() function, but did not mention what it means. An array is like a list, except that it stipulates that all elements in the array must be of the same type, unless otherwise specified, the data type is automatically determined according to the data type.
Examples are as follows:
from PIL import Image
from pylab import *
im = array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg'))
print (im.shape, im.dtype)
im = array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L'),'f')
print (im.shape, im.dtype)
output:
(800, 569, 3) uint8
(800, 569) float32
explain:
The first tuple indicates the image array size (row, column, color channel) and
the second string indicates the data type of the array elements, because images are usually encoded as 8-bit unsigned integers;
- uint8: default type
- float32: Grayscale the image and add the 'f' parameter, so it becomes a floating point type
- How to access array elements - using subscript access
value=im[i,j,k]
- How to send multiple array elements to me—use the array slice method to access, and return the element value of the array accessed by the subscript at the specified interval
im[i,:] = im[j,:] #将第j行的数值赋值给第i行
im[:,j] = 100 #将第i列所有数值设为100
im[:100,:50].sum() #计算前100行、前50列所有数值的和
im[50:100,50:100] #50~100行,50~100列,不包含第100行和100列
im[i].mean() #第i行所有数值的平均值
im[:,-1] #最后一列
im[-2,:]/im[-2] #倒数第二行
1.3.2 Gray scale transformation
After reading images into NumPy array objects, we can perform arbitrary mathematical operations on them, a simple example is the grayscale transformation of an image, consider an arbitrary function fff , it maps 0~255 to itself, that is, the output interval is the same as the input interval.
Examples are as follows:
from PIL import Image
from numpy import *
from pylab import *
im=array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L'))
print(int(im.min()),int(im.max()))
im2=255-im #对图像进行反向处理
print(int(im2.min()),int(im2.max())) #查看最大/最小元素
im3=(100.0/255)*im+100 #将图像像素值变换到100...200区间
print(int(im3.min()),int(im3.max()))
im4=255.0*(im/255.0)**2 #对像素值求平方后得到的图像
print(int(im4.min()),int(im4.max()))
figure()
gray()
subplot(131)
imshow(im2)
axis('off')
title(r'$f(x)=255-x$')
subplot(132)
imshow(im3)
axis('off')
title(r'$f(x)=\frac{100}{255}x+100$')
subplot(133)
imshow(im4)
axis('off')
title(r'$f(x)=255(\frac{x}{255})^2$')
show()
output:
3 255
0 252
101 200
0 255
- The reverse operation of array transformation can be done using PIL's fromarray() function
pil_im=Image.fromarray(im)
- If the previous operation converted the "uint8" data type to another type, you need to convert the data type back before creating the PIL image:
pil_im=Image.fromarray(uint8(im))
1.3.3 Image scaling
NumPy arrays will be our primary tool for manipulating images and data, but there is no easy way to resize matrices. We can write a simple image resizing function using the PIL image object transformation:
def imresize(im,sz):
""" Resize an image array using PIL. """
pil_im = Image.fromarray(uint8(im))
return array(pil_im.resize(sz))
The adjustment function defined above, you can find it in imtools.py.
1.3.4 Histogram equalization
Histogram equalization refers to flattening the gray histogram of an image so that the distribution probability of each gray value in the transformed image is the same. This method is a good method for normalizing the gray value. And can enhance the contrast of the image.
- Transformation function: the cumulative distribution function (cdf) of the pixel values in the image, the normalization operation that maps the range of pixel values to the target range
The following function is a concrete implementation of histogram equalization:
def histeq(im,nbr_bins=256):
""" 对一幅灰度图像进行直方图均衡化"""
# 计算图像的直方图
imhist,bins = histogram(im.flatten(),nbr_bins,normed=True)
cdf = imhist.cumsum() # 累积分布函数
cdf = 255 * cdf / cdf[-1] # 归一化
# 此处使用到累积分布函数cdf的最后一个元素(下标为-1),其目的是将其归一化到0~1范围
# 使用累积分布函数的线性插值,计算新的像素值
im2 = interp(im.flatten(),bins[:-1],cdf)
return im2.reshape(im.shape), cdf
explain:
- This function has two parameters
- Grayscale image
- The number of bins used in the histogram
- function return value
- equalized image
- Cumulative distribution function for pixel value mapping
Program implementation:
from PIL import Image
from pylab import *
from PCV.tools import imtools
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
im = array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L'))
# 打开图像,并转成灰度图像
#im = array(Image.open('../data/AquaTermi_lowcontrast.JPG').convert('L'))
im2, cdf = imtools.histeq(im)
figure()
subplot(2, 2, 1)
axis('off')
gray()
title(u'原始图像', fontproperties=font)
imshow(im)
subplot(2, 2, 2)
axis('off')
title(u'直方图均衡化后的图像', fontproperties=font)
imshow(im2)
subplot(2, 2, 3)
axis('off')
title(u'原始直方图', fontproperties=font)
#hist(im.flatten(), 128, cumulative=True, normed=True)
hist(im.flatten(), 128, normed=True)
subplot(2, 2, 4)
axis('off')
title(u'均衡化后的直方图', fontproperties=font)
#hist(im2.flatten(), 128, cumulative=True, normed=True)
hist(im2.flatten(), 128, normed=True)
show()
result:
1.3.5 Image Averaging
Averaging an image is a simple method of image noise reduction and is often used to produce artistic effects. Assuming that all images have the same size, we can average the pixels in the same position of the image. Here is an example that demonstrates the average of images:
def compute_average(imlist):
""" 计算图像列表的平均图像"""
# 打开第一幅图像,将其存储在浮点型数组中
averageim = array(Image.open(imlist[0]), 'f')
for imname in imlist[1:]:
try:
averageim += array(Image.open(imname))
except:
print imname + '...skipped'
averageim /= len(imlist)
# 返回uint8 类型的平均图像
return array(averageim, 'uint8')
Note: It is possible that some images cannot be opened and the average result is only the average of a certain image or two images
1.3.6 Perform principal component analysis on images
PCA (Principal Component Analysis) is a very useful dimensionality reduction technique. It is an optimal technique in the sense that it can preserve as much information as possible from the training data while using as few dimensions as possible. Even a small grayscale image of 100×100 pixels has 10,000 dimensions and can be regarded as a point in the 10,000-dimensional space. A megapixel image has millions of dimensions. Due to the high dimensionality of images, we often use dimensionality reduction operations in many computer vision applications. The projection matrix generated by PCA can be regarded as transforming the original coordinates into the existing coordinate system, and each coordinate in the coordinate system is arranged in descending order of importance.
In order to perform PCA transformation on image data, the image needs to be converted into a one-dimensional vector representation. We can use the method in the NumPy class library flatten()
to perform the transformation.
By stacking the flattened images, we can get a matrix, one row of the matrix represents an image. All row images are centered by the mean image before computing the principal directions. We usually use SVD (Singular Value Decomposition, singular value decomposition) method to calculate the principal components; but when the dimension of the matrix is large, the calculation of SVD is very slow, so SVD decomposition is usually not used at this time.
Here is the code for the PCA operation:
from PIL import Image
from numpy import *
def pca(X):
""" 主成分分析:
输入:矩阵X ,其中该矩阵中存储训练数据,每一行为一条训练数据
返回:投影矩阵(按照维度的重要性排序)、方差和均值"""
# 获取维数
num_data,dim = X.shape
# 数据中心化
mean_X = X.mean(axis=0)
X = X - mean_X
if dim>num_data:
# PCA- 使用紧致技巧
M = dot(X,X.T) # 协方差矩阵
e,EV = linalg.eigh(M) # 特征值和特征向量
tmp = dot(X.T,EV).T # 这就是紧致技巧
V = tmp[::-1] # 由于最后的特征向量是我们所需要的,所以需要将其逆转
S = sqrt(e)[::-1] # 由于特征值是按照递增顺序排列的,所以需要将其逆转
for i in range(V.shape[1]):
V[:,i] /= S
else:
# PCA- 使用SVD 方法
U,S,V = linalg.svd(X)
V = V[:num_data] # 仅仅返回前nun_data 维的数据才合理
# 返回投影矩阵、方差和均值
return V,S,mean_X
The function first centers the data by subtracting the mean of each dimension, and then computes the eigenvector corresponding to the largest eigenvalue of the covariance matrix, which can be done using concise tricks or SVD decomposition. Here we use the range() function, the input parameter of this function is an integer n, and the function returns a list of integers 0...(n-1). You can also use the arange() function to return an array, or the xrange() function to return a generator (probably for speed). We use the range() function throughout this book.
If the number of data is less than the dimension of the vector, instead of SVD decomposition, we calculate the eigenvectors of the covariance matrix XXT with smaller dimension. The above PCA operation can be made faster by computing only the eigenvectors corresponding to the top k (k is the reduced dimensionality) largest eigenvalues. Due to space limitations, interested readers can explore by themselves. Each row of vectors in matrix V is orthogonal and contains coordinate directions in which the variance of the training data decreases in turn.
We next perform a PCA transformation on the font image. The fontimages.zip file contains thumbnail images of the character a in different fonts. All 2359 fonts can be downloaded for free2. Assuming that the names of these images are stored in the list imlist, which is saved together with the previous code in the pca.py file, we can use the following script to calculate the principal components of the image:
import pickle
from PIL import Image
from numpy import *
from pylab import *
from PCV.tools import imtools,pca
# Uses sparse pca codepath
# 获取图像列表和尺寸
imlist=imtools.get_imlist('E:/python/Python Computer Vision/Image data/fontimages/a_thumbs')
# open ont image to get the size
im=array(Image.open(imlist[0]))
# get the size of the images
m,n=im.shape[:2]
# get the number of images
imnbr=len(imlist)
print("The number of images is %d" % imnbr)
# create matrix to store all flattened images
immatrix = array([array(Image.open(imname)).flatten() for imname in imlist],'f')
# PCA降维
V,S,immean=pca.pca(immatrix)
# 保存均值和主成分
#f = open('../ch01/font_pca_modes.pkl', 'wb')
#pickle.dump(immean,f)
#pickle.dump(V,f)
#f.close()
# Show the images (mean and 7 first modes)
# This gives figure 1-8 (p15) in the book.
figure()
gray()
subplot(241)
axis('off')
imshow(immean.reshape(m,n))
for i in range(7):
subplot(2,4,i+2)
imshow(V[i].reshape(m,n))
axis('off')
show()
Note that after these images have been pulled into a one-dimensional representation, they must be transformed back using the reshape() function. Run the above code, you can get the results in the original book P15 Figure 1-8, namely:
1.3.7 Pickle module
The module in Python pickle
is very useful if you want to save some results or data for later use. pickle
Modules can accept almost any Python object and convert it into a string representation, a process called pickling. Reconstructing the object from its string representation is called unpickling. These string representations can be conveniently stored and transmitted.
Let's look at an example. Assuming you want to save the mean image and principal components of the font images from the previous section, this can be done like this:
# 保存均值和主成分数据
f = open('font_pca_modes.pkl','wb')
pickle.dump(immean,f)
pickle.dump(V,f)
f.close()
In the above example, many objects can be saved to the same file. pickle
There are many different protocols in the module that can generate .pkl
files; if you are not sure, it is best to read and write as binary files. To load data in other Python sessions, just use load()
the method as follows:
# 载入均值和主成分数据
f = open('font_pca_modes.pkl','rb')
immean = pickle.load(f)
V = pickle.load(f)
f.close()
Note that objects must be loaded in the same order as they were previously saved. There is an optimized version of Python written in C called cpickle
the module, which is pickle
fully compatible with the standard module. For more information about the pickle module, see the pickle module documentation page at http://docs.python.org/library/pickle.html.
In the rest of this book, we'll use the with statement to handle file reads and writes. This is an idea introduced in Python 2.5 to automatically open and close files (even if errors occur while opening the file). The following example uses with()
to implement save and load operations:
# 打开文件并保存
with open('font_pca_modes.pkl', 'wb') as f:
pickle.dump(immean,f)
pickle.dump(V,f)
and
# 打开文件并载入
with open('font_pca_modes.pkl', 'rb') as f:
immean = pickle.load(f)
V = pickle.load(f)
The above example may seem strange at first, but with() is a very useful idea. If you don't like it, you can use the previous open and close functions.
As pickle
an alternative to , NumPy has simple functions for reading and writing text files. NumPy's read and write functions are useful if the data does not contain complex data structures, such as a list of points clicked on an image. To save an array x to a file, use:
savetxt('test.txt',x,'%i')
The last parameter indicates that integer format should be used. Similarly, reading can be done using:
x = loadtxt('test.txt')
You can learn more from the online documentation
Finally, NumPy has dedicated functions for saving and loading arrays, see more about save()
and in the online documentation.load()
1.4 SciPy
SciPy (http://scipy.org/) is an open source toolkit for numerical operations based on NumPy. SciPy provides many efficient operations for numerical integration, optimization, statistics, signal processing, and most importantly for us, image processing.
1.4.1 Blurred image
Gaussian blurring of images is a very classic example of image convolution. Essentially, image blurring is the convolution of the (grayscale) image $I$ with a Gaussian kernel:
Among them, * means convolution, G δ G_\deltaGdIndicates that the standard deviation is δ \deltaDelta convolution kernel
- Filter operation module——
scipy.ndimage.filters
This module can use fast one-dimensional separation to calculate the convolution, and the usage is as follows:
from PIL import Image
from numpy import *
from pylab import *
from scipy.ndimage import filters
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font=FontProperties(fname=r"c:\windows\fonts\SimSun.ttc",size=14)
im=array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L'))
figure()
gray()
axis('off')
subplot(141)
axis('off')
title(u'原图',fontproperties=font)
imshow(im)
for bi,blur in enumerate([2,5,10]):
im2=zeros(im.shape)
im2=filters.gaussian_filter(im,blur)
im2=np.uint8(im2)
imNum=str(blur)
subplot(1,4,2+bi)
axis('off')
title(u'标准差为'+imNum,fontproperties=font)
imshow(im2)
#如果是彩色图像,则分别对三个通道进行模糊
#for bi, blur in enumerate([2, 5, 10]):
# im2 = zeros(im.shape)
# for i in range(3):
# im2[:, :, i] = filters.gaussian_filter(im[:, :, i], blur)
# im2 = np.uint8(im2)
# subplot(1, 4, 2 + bi)
# axis('off')
# imshow(im2)
show()
The first picture above is the image to be blurred, the second picture is blurred with a Gaussian standard deviation of 2, the third picture is blurred with a Gaussian standard deviation of 5, and the last picture is blurred with a Gaussian standard deviation of 10. For more details on the use of this module and parameter selection, please refer to the SciPy scipy.ndimage documentation
1.4.2 Image Derivatives
The variation of image intensity is very important information in many applications. Intensity changes can be made with grayscale images III (for color images, the derivatives are usually computed separately for each color channel) ofxxx and $y$ directional derivativeI x I_xIxand $I_y$ for description.
- The gradient vector of the image is ∇ I = [ I x , I y ] T ∇ I = [I_x, I_y]^T∇I=[Ix,Iy]T , describes the direction in which the intensity of the image varies the most on each pixel.
- Gradients have two important properties:
- Gradient magnitude:
∣ ∇ I ∣ = I x 2 + I y 2 |∇I| = \sqrt {I_x^2+I_y^2}∣∇I∣=Ix2+Iy2 - The direction of the gradient:
α = arctan 2 ( I x , I y ) \alpha=arctan2(I_x, I_y)a=arctan2(Ix,Iy)
NumPy
The function in arctan2()
returns the signed angle expressed in radians, and the change range of the angle is [ − π , π ] [-\pi,\pi][ − π ,p ]
We can compute the derivative of the image in a discrete approximation. Image derivatives can mostly be implemented simply by convolution:
I x = I ∗ D x I_x=I*D_xIx=I∗Dx, I y = I ∗ D y I_y=I*D_y Iy=I∗Dy
For, prewitt filters or sobel filters are usually chosen.
These derivative filters can be scipy.ndimage.filters
implemented simply using the standard convolution operations of the module
from PIL import Image
from pylab import *
from scipy.ndimage import filters
import numpy
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font=FontProperties(fname=r"c:\windows\fonts\SimSun.ttc",size=14)
im=array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L'))
gray()
subplot(141)
axis('off')
title(u'(a)原图',fontproperties=font)
imshow(im)
# sobel derivative filters
imx=zeros(im.shape)
filters.sobel(im,1,imx)
subplot(142)
axis('off')
title(u'(b)x方向差分',fontproperties=font)
imshow(imx)
imy=zeros(im.shape)
filters.sobel(im,0,imy)
subplot(143)
axis('off')
title(u'(c)y方向差分',fontproperties=font)
imshow(imy)
mag=255-numpy.sqrt(imx**2+imy**2)
subplot(144)
title(u'(d)梯度幅值',fontproperties=font)
axis('off')
imshow(mag)
show()
Difference of Gaussian:
from PIL import Image
from pylab import *
from scipy.ndimage import filters
import numpy
# 添加中文字体支持
#from matplotlib.font_manager import FontProperties
#font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
def imx(im, sigma):
imgx = zeros(im.shape)
filters.gaussian_filter(im, sigma, (0, 1), imgx)
return imgx
def imy(im, sigma):
imgy = zeros(im.shape)
filters.gaussian_filter(im, sigma, (1, 0), imgy)
return imgy
def mag(im, sigma):
# there's also gaussian_gradient_magnitude()
#mag = numpy.sqrt(imgx**2 + imgy**2)
imgmag = 255 - numpy.sqrt(imgx ** 2 + imgy ** 2)
return imgmag
im = array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L'))
figure()
gray()
sigma = [2, 5, 10]
for i in sigma:
subplot(3, 4, 4*(sigma.index(i))+1)
axis('off')
imshow(im)
imgx=imx(im, i)
subplot(3, 4, 4*(sigma.index(i))+2)
axis('off')
imshow(imgx)
imgy=imy(im, i)
subplot(3, 4, 4*(sigma.index(i))+3)
axis('off')
imshow(imgy)
imgmag=mag(im, i)
subplot(3, 4, 4*(sigma.index(i))+4)
axis('off')
imshow(imgmag)
show()
1.4.3 Morphology: object counting
Morphology (or mathematical morphology) is the basic framework and collection of image processing methods for measuring and analyzing basic shapes. Morphology is typically used for binary images, but can also be used for grayscale images. A binary image means that each pixel of the image can only take two values, usually 0 and 1. Binary images are usually the result of thresholding an image when counting objects, or measuring their size. An overview of morphology and how it handles images can be read at http://en.wikipedia.org/wiki/Mathematical_morphology.
scipy.ndimage
The module in morphology
can realize the morphological operation The module
scipy.ndimage
in measurements
can realize the counting and measurement function of the binary image
Here is a simple example of how to use them:
from scipy.ndimage import measurements,morphology
# 载入图像,然后使用阈值化操作,以保证处理的图像为二值图像
im = array(Image.open('houses.png').convert('L'))
im = 1*(im<128)
labels, nbr_objects = measurements.label(im)
print "Number of objects:", nbr_objects
- The above script first loads the image and thresholds it to ensure that it is a binary image. The script converts the boolean array into a binary representation by multiplying by 1.
- We then use the label() function to find individual objects and assign integer labels to pixels according to which object they belong to.
- Figure 1-12b is an image of the labels array. The grayscale value of the image represents the label of the object. As you can see, there are some small connections between some objects. Doing a binary open, we can remove it:
# 形态学开操作更好地分离各个对象
im_open = morphology.binary_opening(im,ones((9,5)),iterations=2)
labels_open, nbr_objects_open = measurements.label(im_open)
print "Number of objects:", nbr_objects_open
-
binary_opening()
The second parameter of the function specifies an array structure element. -
This array indicates which adjacent pixels to use when centering on a pixel.
-
In this case we use 9 pixels in the y direction (4 pixels above, the pixel itself, 4 pixels below) and 5 pixels in the x direction. You can specify any array as the structure element, and the non-zero elements in the array determine which adjacent pixels are used.
-
The parameter iterations determines how many times to perform the operation. You can experiment with different iterations values and see how the number of objects changes.
-
The opened image, and the corresponding label image, can be viewed in Figure 1-12c and Figure 1-12d.
-
binary_closing()
Functions do the opposite. -
We leave the use of this function and other functions in the morphology and measurements modules as an exercise. You can learn more about these functions from the scipy.ndimage module documentation .
from PIL import Image
from numpy import *
from scipy.ndimage import measurements, morphology
from pylab import *
""" This is the morphology counting objects example in Section 1.4. """
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
# load image and threshold to make sure it is binary
figure()
gray()
im = array(Image.open('E:/python/Python Computer Vision/Image data/houses.png').convert('L'))
subplot(221)
imshow(im)
axis('off')
title(u'原图', fontproperties=font)
im = (im < 128)
labels, nbr_objects = measurements.label(im)
print ("Number of objects:", nbr_objects)
subplot(222)
imshow(labels)
axis('off')
title(u'标记后的图', fontproperties=font)
# morphology - opening to separate objects better
im_open = morphology.binary_opening(im, ones((9, 5)), iterations=2)
subplot(223)
imshow(im_open)
axis('off')
title(u'开运算后的图像', fontproperties=font)
labels_open, nbr_objects_open = measurements.label(im_open)
print ("Number of objects:", nbr_objects_open)
subplot(224)
imshow(labels_open)
axis('off')
title(u'开运算后进行标记后的图像', fontproperties=font)
show()
output:
Number of objects: 45
Number of objects: 48
1.4.4 Useful SciPy modules
SciPy
contains some useful modules for input and output. Two of these modules are described below: io
andmisc
1. Read and write .mat files
If you have some data, or downloaded some interesting data sets from the Internet, these data are stored in Matlab's .mat file format, then you can use the scipy.io module to read them.
data = scipy.io.loadmat('test.mat')
In the code above, the data object contains a dictionary whose keys correspond to the variable names stored in the original .mat file. Since these variables are in array format, they can be conveniently saved to a .mat file. You just create a dictionary with all the variables you want to save, and use the savemat() function:
data = {}
data['x'] = x
scipy.io.savemat('test.mat',data)
Because the above script saves the array x, when it is read into Matlab, the name of the variable is still x. For more information about scipy.io
the module, see the online documentation .
2. Save the array as an image
Because we need to operate on images, and we need to use array objects to do operations, it is very useful to save the array directly as an image file4. Many of the images in this book were created this way.
imsave()
Function: scipy.misc
Loaded from module. To im
save an array to a file, use the following command:
from scipy.misc import imsave
imsave('test.jpg',im)
scipy.misc
The module also contains the famous Lena test image:
lena = scipy.misc.lena()
The script returns a 512x512 array of grayscale images
All Pylab plots can be saved in various image formats by clicking the "Save" button in the image window.
1.5 Advanced Example: Image Denoising
We end this chapter with a very practical example—image denoising. Image denoising is a processing technique that preserves image details and structures as much as possible while removing image noise. We use the ROF (Rudin-Osher-Fatemi) denoising model here. This model first appeared in the literature [28]. Image denoising is important for many applications; from making your vacation photos look better to improving the quality of satellite imagery. The ROF model has the nice property of making the processed image smoother while maintaining image edge and structural information.
The mathematical foundations and processing techniques of the ROF model are too advanced to be covered in this book. Before describing how to implement the ROF solver based on the algorithm proposed by Chambolle [5], this book first briefly introduces the ROF model.
Denoising synthesis example:
from pylab import *
from numpy import *
from numpy import random
from scipy.ndimage import filters
from scipy.misc import imsave
from PCV.tools import rof
""" This is the de-noising example using ROF in Section 1.5. """
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
# create synthetic image with noise
im = zeros((500,500))
im[100:400,100:400] = 128
im[200:300,200:300] = 255
im = im + 30*random.standard_normal((500,500))
U,T = rof.denoise(im,im)
G = filters.gaussian_filter(im,10)
# save the result
#imsave('synth_original.pdf',im)
#imsave('synth_rof.pdf',U)
#imsave('synth_gaussian.pdf',G)
# plot
figure()
gray()
subplot(1,3,1)
imshow(im)
#axis('equal')
axis('off')
title(u'原噪声图像', fontproperties=font)
subplot(1,3,2)
imshow(G)
#axis('equal')
axis('off')
title(u'高斯模糊后的图像', fontproperties=font)
subplot(1,3,3)
imshow(U)
#axis('equal')
axis('off')
title(u'ROF降噪后的图像', fontproperties=font)
show()
The first picture shows the original noise image, the middle picture shows the result of Gaussian blur with a standard deviation of 10, and the rightmost picture is the image after ROF noise reduction. The original noise image above is a simulated image, and now we test it on a real image:
from PIL import Image
from pylab import *
from numpy import *
from numpy import random
from scipy.ndimage import filters
from scipy.misc import imsave
from PCV.tools import rof
""" This is the de-noising example using ROF in Section 1.5. """
# 添加中文字体支持
from matplotlib.font_manager import FontProperties
font = FontProperties(fname=r"c:\windows\fonts\SimSun.ttc", size=14)
im = array(Image.open('E:/python/Python Computer Vision/Image data/empire.jpg').convert('L'))
U,T = rof.denoise(im,im)
G = filters.gaussian_filter(im,10)
# save the result
#imsave('synth_original.pdf',im)
#imsave('synth_rof.pdf',U)
#imsave('synth_gaussian.pdf',G)
# plot
figure()
gray()
subplot(1,3,1)
imshow(im)
#axis('equal')
axis('off')
title(u'原噪声图像', fontproperties=font)
subplot(1,3,2)
imshow(G)
#axis('equal')
axis('off')
title(u'高斯模糊后的图像', fontproperties=font)
subplot(1,3,3)
imshow(U)
#axis('equal')
axis('off')
title(u'ROF降噪后的图像', fontproperties=font)
show()
ROF noise reduction can preserve edges and image structure
1.6 Installation of PCV package
-
Download PCV library file data, download address: https://github.com/jesolem/PCV
-
Extract the downloaded file to: C:\Users\Administrator\Desktop\PCV
-
Open cmd and execute the following command:
(1)cd C:\Users\Administrator\Desktop\PCV
(2)python setup.py install
-
Enter in pycharm
import PCV
to test whether the installation is successful
If error 1 is reported: NameError: name 'file' is not defined
, fp = file(filename, 'wb')
change tofp = open(filename, 'wb')
If the error 2: is reported TypeError: write() argument must be str, not bytes
, there is a problem with the file opening method. Just modify the previous opening statement to open in binary mode.
filelist = get_imlist('E:/python/Python Computer Vision/test jpg/') #获取convert_images_format_test文件夹下的图片文件名(包括后缀名)
imlist = open('E:/python/Python Computer Vision/test jpg/imlist.txt','wb+')