Original text: Hands-On Image Processing with Python
License: CC BY-NC-SA 4.0
Translator: Feilong
This article comes from [ApacheCN Computer Vision Translation Collection] , using the post-translation editing (MTPE) process to improve efficiency as much as possible.
When others say you have no bottom line, you'd better not; when others say you've done something, you'd better really do it.
1. Introduction to image processing
As the name suggests, image processing can be simply defined as the processing (analysis and manipulation) of images using algorithms in a computer (through code). It has several different aspects such as image storage, representation, information extraction, manipulation, enhancement, recovery and interpretation. In this chapter, we will provide a basic introduction to all these different aspects of image processing and introduce practical image processing using Python libraries. All code examples in this book will use Python 3.
We'll start by defining what image processing is and its applications. Then we will find out. . .
What is image processing and some applications
Let's first define what an image is, how it is stored on the computer, and how to process it using Python.
What is an image and how is it stored on the computer
Conceptually, an image in its simplest form ( single channel ; e.g., binary or monochrome, grayscale, or black and white image) is a 2D function f(x,y) that maps pairs of coordinates to integers/reals, with Point intensity/color dependent. Each point is called a pixel or pixel (picture element). An image can also have multiple channels (such as a color RGB image, where color can be represented by three channels: red, green, and blue). For color RGB images, each pixel at the (x, y) coordinate can be represented by the triplet (r x, y , g x, y , b x, y) .
To be able to process it on a computer, an image *f(x,y)* needs to be digitized spatially and spatially. . .
What is image processing?
Image processing refers to the automatic processing, manipulation, analysis, and interpretation of images using algorithms and codes on computers. It has applications in many disciplines and fields of science and technology such as television, photography, robotics, remote sensing, medical diagnosis, and industrial testing. Social networking sites like Facebook and Instagram are prime examples of industries that require the use/innovation of many image processing algorithms to process the images we upload. We have become accustomed to using these sites in our daily lives, uploading tons of images every day.
In this book, we will use some Python packages to process images. First, we will use a set of libraries for classic image processing: starting from extracting image data, transforming the data using some algorithms, using library functions for preprocessing, enhancement, restoration, representation (using descriptors), segmentation, classification, detection and identification (objects) to analyze, understand, and better interpret data. Next, we will use another set of libraries for deep learning based image processing, a technique that has become very popular in the past few years.
Some applications of image processing
Some typical applications of image processing include medical/biological fields (e.g., X-ray and CT scans), computational photography (Photoshop), fingerprint authentication, face recognition, etc.
Image processing pipeline
The following steps describe the basic steps in the image processing pipeline:
-
Capture and Storage****Storage : An image needs to be captured (e.g. using a camera) and stored as a file (e.g. a JPEG file) on some device (e.g. a hard drive).
-
Load to memory and save to disk : The image needs to be read from disk to memory and
numpy ndarray
stored using some data structure (e.g. ), and the data structure needs to be serialized into the image file later, probably after running some algorithm on the image . -
**Operation, Enhancement, and Recovery:** We need to run some preprocessing algorithms to do the following:
- Run some transformation on the image (sampling and manipulation; e.g. grayscale conversion)
- Enhance image quality (filtering; e.g., deblurring)
- Recover images unaffected by noise
-
Segmentation : In order to extract objects of interest, the image needs to be segmented.
-
Information extraction/representation : The image needs to be represented in some alternative form; for example, one of the following options:
- Some hand-crafted feature descriptors can be computed from images (e.g. HOG descriptors using classical image processing)
- Some features can be learned automatically from images (e.g. weights and bias values learned in the hidden layers of neural networks through deep learning)
The image will be represented using this alternative notation
-
Image Understanding/Interpretation**: **This representation will be used to better understand the image through:
- Image classification (e.g., does the image contain a human object)
- Object recognition* ( e.g. , *find the location of a car object in an image with borders)
The following figure describes the different steps in image processing:
The diagram below represents the different modules we will use for different image processing tasks:
In addition to these libraries, we will use the following libraries:
scipy.ndimage
andopencv
for different image processing tasksscikit-learn
for classic machine learningtensorflow
andkeras
for deep learning
Setting up different image processing libraries in Python
The next few paragraphs describe how to install different image processing libraries and set up the environment to write code to process images using classic image processing techniques in Python. In the final chapters of the book, when we use deep learning-based methods, we will need to use different settings.
install pip
We will be using the pip
or pip3
tool to install the library, so we need to install it first if it is not already installed pip
. As mentioned in this article ( https://pip.pypa.io/en/stable/installing/#do-i-need-to-install-pip ) pip
, it is already installed if we use Python 3 downloaded from Python.org >= 3.4, or if we are working in a virtual environment ( ) https://packaging.python.org/tutorials/installing-packages/#creating-using virtualenv
( created virtual environment ) https://packaging.python.org/key_projects/ #virtualenv or pyvenv
( https://packaging.python.org/key_projects/#venv . We just need to make sure to upgrade pip
( https://pip.pypa.io/en/stable/installing/#upgrading-pip . How to do this for different Operating system or platform installation pip
can be found here: https://stackoverflow.com/questions/6587507/how-to-install-pip-with-python-3 . **
***# Install some image processing libraries in Python
In Python, there are many libraries available for image processing. The ones we will be using are: NumPy, SciPy, scikit image, PIL (Pillow), OpenCV, scikit learn, SimpleITK and Matplotlib.
matplotlib
The library is primarily for display and numpy
will be used for storing images. scikit-learn
The library will be used to build machine learning models for image processing, and scipy
the library will be mainly used for image enhancement. scikit-image
, mahotas
and opencv
libraries will be used for different image processing algorithms.
The following code block shows how the library we are going to use is passed pip
from P. . .
Install Anaconda patch panel
We also recommend downloading and installing the latest version of the Anaconda distribution; this will eliminate the need to explicitly install many Python packages
More about installing Anaconda for different OSes can be found at https://conda.io/docs/user-guide/install/index.html.
Install Jupyter Notebook
We will use Jupyter Notebook to write Python code. Therefore, we need to first install the package with Python prompt >>> pip install jupyter
and then launch the Jupyter Notebook application jupyter
in the browser using . >>> jupyter notebook
From there, we can create a new Python notebook and select a kernel. If we use Anaconda, we don't need to install Jupyter explicitly; the latest Anaconda distribution comes with Jupyter.
More about running Jupyter notebooks can be found at http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html.
We can even install Python packages in the laptop unit; for example, we can !pip install scipy
install using the command scipy
.
For more information …
Image I/O and display using Python
Images are stored on disk as files, so reading and writing images from files are disk I/O operations. These can be achieved in a variety of ways using different libraries; some of them are shown in this section. Let's first import all required packages:
# for inline image display inside notebook
# % matplotlib inline
import numpy as np
from PIL import Image, ImageFont, ImageDraw
from PIL.ImageChops import add, subtract, multiply, difference, screen
import PIL.ImageStat as stat
from skimage.io import imread, imsave, imshow, show, imread_collection, imshow_collection
from skimage import color, viewer, exposure, img_as_float, data
from skimage.transform import SimilarityTransform, warp, swirl
from skimage.util import invert, random_noise, montage
import matplotlib.image as mpimg
import matplotlib.pylab as plt
from scipy.ndimage import affine_transform, zoom
from scipy import misc
Read, save and display images using PIL
The PIL function reads the image open()
from Image
disk in the object as shown in the following code. Images PIL.PngImagePlugin.PngImageFile
are loaded as objects of a class and we can use properties like width, height and mode to find the size ( width x height pixels or image resolution) and mode of the image:
im = Image.open("../images/parrot.png") # read the image, provide the correct pathprint(im.width, im.height, im.mode, im.format, type(im))# 453 340 RGB PNG <class 'PIL.PngImagePlugin.PngImageFile'>im.show() # display the image
Here is the output of the previous code:
The following code block shows how to use PIL functions convert()
to convert. . .
Provide the correct path to the image on disk
We recommend creating a folder (subdirectory) to store the images you want to use for processing (for example, for the Python code example, we used images
images stored in a folder named ) and then providing the path to the folder to access the images, to avoid file not found
exceptions.
Read, save and display images using Matplotlib
The next code block shows how to read an image in floating point using the functions matplotlib.image
in . Pixel values are represented as real values between 0 and 1:imread()
numpy ndarray
im = mpimg.imread("../images/hill.png") # read the image from disk as a numpy ndarrayprint(im.shape, im.dtype, type(im)) # this image contains an α channel, hence num_channels= 4# (960, 1280, 4) float32 <class 'numpy.ndarray'>plt.figure(figsize=(10,10))plt.imshow(im) # display the imageplt.axis('off')plt.show()
The following image shows the output of the previous code:
Next code snippet. . .
Interpolation when displaying using Matplotlib imshow()
Matplotlib's imshow()
functions provide many different types of interpolation methods for plotting images. These functions are particularly useful when the image to be printed is small. Let's use the small 50 x 50 image shown below lena
to see the effect of drawing with different interpolation methods:
The next code block demonstrates how to imshow()
use different interpolation methods in:
im = mpimg.imread("../images/lena_small.jpg") # read the image from disk as a numpy ndarray
methods = ['none', 'nearest', 'bilinear', 'bicubic', 'spline16', 'lanczos']
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(15, 30),
subplot_kw={
'xticks': [], 'yticks': []})
fig.subplots_adjust(hspace=0.05, wspace=0.05)
for ax, interp_method in zip(axes.flat, methods):
ax.imshow(im, interpolation=interp_method)
ax.set_title(str(interp_method), size=20)
plt.tight_layout()
plt.show()
The following image shows the output of the previous code:
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-IuA9mjlQ-1681961323628) (null)]
Read, save and display images using scikit-image
The next block of code uses scikit-image
the imread()
function in to read an image uint8
of type numpy ndarray
(8-bit unsigned integer). Therefore, the pixel value will be between 0 and 255. Then use Image.color
the module's hsv2rgb()
functions to convert the color RGB image to an HSV image (changing the image type or mode, discussed later). Next, change the saturation (chroma) of all pixels to a constant value by keeping the hue and value channels unchanged. Then use rgb2hsv()
the function to convert the image back to RGB mode to create a new image, then save and display:
im = imread("../images/parrot.png") # read ...
Astronaut dataset using scikit image
The code block below shows how to use data
the module to load images from scikit-image
the library's image dataset astronaut
. This module contains some other popular datasets, such as cameraman, which can be loaded similarly:
im = data.astronaut()
imshow(im), show()
The following image shows the output of the previous code:
Read and display multiple images at once
We can use the scikit image io
module's imread_collection()
function to load all images with a specific pattern in their file names into a folder and imshow_collection()
display them simultaneously with the function. The code is left as an exercise for the reader.
Read, save and display images using scipy misc
scipy
的misc
模块也可用于图像 I/O 和显示。以下各节演示如何使用misc
模块功能。
使用 scipy.misc 的人脸数据集
下一个代码块展示了如何显示misc
模块的face
数据集:
im = misc.face() # load the raccoon's face imagemisc.imsave('face.png', im) # uses the Image module (PIL)plt.imshow(im), plt.axis('off'), plt.show()
下图为前一代码的输出,显示misc
模块的face
画面:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FJJlUBua-1681961326681)(null)]
我们可以使用misc.imread()
从磁盘读取图像。下一个代码块显示了一个示例:
im = misc.imread('../images/pepper.jpg')print(type(im), im.shape, im.dtype)# <class 'numpy.ndarray'> (225, 225, 3) uint8
在 SciPy 1.0.0 中,I/O 函数的imread()
已被弃用,并将被删除。。。
处理不同的图像类型和文件格式,并执行基本的图像处理
在本节中,我们将讨论不同的图像处理函数(使用点变换和几何变换)以及如何处理不同类型的图像。让我们从这个开始
处理不同的图像类型和文件格式
图像可以以不同的文件格式和不同的模式(类型)保存。让我们讨论如何使用 Python 库处理不同文件格式和类型的图像
文件格式
图像文件可以是不同的格式。一些流行的格式包括 BMP(8 位、24 位、32 位)、PNG、JPG(JPEG)、GIF、PPM、PNM 和 TIFF。我们不需要担心图像文件的特定格式(以及元数据的存储方式)来从中提取数据。Python 图像处理库将读取图像并提取数据,以及其他一些对我们有用的信息(例如,图像大小、类型/模式和数据类型)。
从一种文件格式转换为另一种文件格式
使用 PIL,我们可以读取一种文件格式的图像并将其保存到另一种文件格式;例如,从 PNG 到 JPG*,*如下图所示:
im = Image.open("../images/parrot.png")print(im.mode) # RGBim.save("../images/parrot.jpg")
But if the PNG file is in RGBA
modal, we need to convert it to RGB
modal first and then save it as JPG, otherwise an error will appear. The next code block shows how to first convert and then save:
im = Image.open("../images/hill.png")print(im.mode)# RGBAim.convert('RGB').save("../images/hill.jpg") # first convert to RGB mode
Image type (mode)
Images can be of different types:
- Each pixel of a single-channel image is represented by a single value:
- Binary (monochrome) image (each pixel is represented by a single 0-1 bit)
- Grayscale image (each pixel can be represented by 8 bits, and its value is usually between 0-255)
- Each pixel of a multi-channel image is represented by a set of values:
- Three-channel image; for example, the following:
- Each pixel of an RGB image is represented by three tuples ( r, g, b values), representing the red, green, and blue channel color values of each pixel.
- Each pixel of the HSV image is represented by three tuples ( h, s, v values, which respectively represent the hue (color), saturation (the degree of mixing of color and white) and value (the degree of mixing of brightness and black) of each pixel. Channel color values. The HSV model describes color in a manner similar to how the human eye perceives color
- Four-channel image; for example, an RGBA image has each pixel represented by three tuples ( r, g, b, alpha value), with the last channel representing transparency.
- Three-channel image; for example, the following:
Convert from one image mode to another
We can convert the RGB image to grayscale while reading the image itself. The code below does exactly that:
im = imread("images/parrot.png", as_gray=True)print(im.shape)#(362L, 486L)
Note that for some color images some information may be lost when converting to grayscale. The following code shows an example of Ishihara cards for detecting color blindness. This time using color
the module's rgb2gray()
functionality, the color and grayscale images are displayed side by side. As shown in the image below, the number 8 is barely visible in the grayscale version:
im = imread("../images/Ishihara.png")im_g = color.rgb2gray(im)plt.subplot(121), plt.imshow(im, ...
certain color spaces (channels)
Here are a few common channels/color spaces for images: RGB, HSV, XYZ, YUV, YIQ, YPbPr, YCbCr, and YDbDr. We can use affine mapping to go from one color space to another. The following matrix represents the linear mapping from RGB to YIQ color space:
Convert from one color space to another
We can use library functions to convert from one color space to another; for example, the following code converts an RGB color space to an HSV color space image:
im = imread("../images/parrot.png")im_hsv = color.rgb2hsv(im)plt.gray()plt.figure(figsize=(10,8))plt.subplot(221), plt.imshow(im_hsv[...,0]), plt.title('h', size=20), plt.axis('off')plt.subplot(222), plt.imshow(im_hsv[...,1]), plt.title('s', size=20), plt.axis('off')plt.subplot(223), plt.imshow(im_hsv[...,2]), plt.title('v', size=20), plt.axis('off')plt.subplot(224), plt.axis('off')plt.show()
The image below shows h ( heu or color: the dominant wavelength of the reflected light , s ( saturation or chroma ) and v ( value or brightness/… )
Data structure used to store images
As we have already discussed, PIL uses Image
objects to store images, while scikit-image uses numpy ndarray
data structures to store image data. The next section describes how to convert between these two data structures.
Convert image data structure
The following code block shows how to convert a PIL Image
object to numpy ndarray
(used by scikit-image):
im = Image.open('../images/flowers.png') # read image into an Image object with PILim = np.array(im) # create a numpy ndarray from the Image objectimshow(im) # use skimage imshow to display the imageplt.axis('off'), show()
The next image shows the output of the previous code, which is an image of a flower:
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-orrexDs4-1681961326898) (null)]
The following code block shows how to numpy ndarray
convert from PIL Image
. When run, the code displays the same output as above:
im = imread('../images/flowers.png') ...
basic image processing
Different Python libraries are available for basic image processing. Almost all libraries store images in numpy ndarray
(for example, a two-dimensional array for grayscale and a three-dimensional array for RGB images). The following figure shows the positive x and ylena
directions of a color image (the origin is the upper left corner of the 2D array of the image):
Image processing based on numpy array slices
The next code block shows how to create a circular mask on an image using numpy
slices and masks of an array :lena
lena = mpimg.imread("../images/lena.jpg") # read the image from disk as a numpy ndarrayprint(lena[0, 40])# [180 76 83]# print(lena[10:13, 20:23,0:1]) # slicinglx, ly, _ = lena.shapeX, Y = np.ogrid[0:lx, 0:ly]mask = (X - lx / 2) ** 2 + (Y - ly / 2) ** 2 > lx * ly / 4lena[mask,:] = 0 # masksplt.figure(figsize=(10,10))plt.imshow(lena), plt.axis('off'), plt.show()
The image below shows the output of the code:
Simple image warping - alpha blending of two images using cross-dissolve
The code block below shows how to start with one image of a face (*Image1 * is Messi's face) and then use numpy ndarrays
a linear combination of the two images to end up with another image (*Image2 * is Ronaldo's face) At the end, the formula is as follows:
We achieve this by iteratively increasing α from 0 to 1:
im1 = mpimg.imread("../images/messi.jpg") / 255 # scale RGB values in [0,1]
im2 = mpimg.imread("../images/ronaldo.jpg") / 255
i = 1
plt.figure(figsize=(18,15))
for alpha in np.linspace(0,1,20):
plt.subplot(4,5,i)
plt.imshow((1-alpha)*im1 + alpha*im2)
plt.axis('off')
i += 1
plt.subplots_adjust(wspace=0.05, hspace=0.05)
plt.show()
The next image shows a sequence of alpha blended images created using the previous code, which cross-decomposes images of Messi's face into Ronaldo's. As can be seen from the middle image sequence in the figure, the simple blended face deformation is not very smooth. In the following chapters, we will see more advanced image deformation techniques:
PIL-based image processing
PIL provides us with many functions for manipulating images; for example, changing pixel values using point transformations or performing geometric transformations on the image. Let us first load the parrot PNG image as shown in the following code:
im = Image.open("../images/parrot.png") # open the image, provide the correct pathprint(im.width, im.height, im.mode, im.format) # print image size, mode and format# 486 362 RGB PNG
The next few sections will describe how to use PIL for different types of image processing.
Crop image
crop()
We can crop the corresponding area from the image using a function with the desired rectangle parameter , as shown in the following code:
im_c = im.crop((175,75,320,200)) # crop the rectangle given by (left, top, right, bottom) from the image
im_c.show()
The image below shows the cropped image created using the previous code:
Resize image
In order to increase or decrease the size of an image, we can use resize()
functions that internally upsample or downsample the image respectively. This will be discussed in detail in the next chapter.
Resize to larger image
Let's start with a small clock image sized 149 x 97 and then create a larger image. The code snippet below shows the small clock image we will start with:
im = Image.open("../images/clock.jpg")print(im.width, im.height)# 107 105im.show()
The output of the previous code, the small clock image, looks like this:
The next line of code shows how to use resize()
the function. . .
negative image
We can use point()
function to transform each pixel value with a parametric function. We can use it to negate the image as shown in the next code block. Pixel values are represented using 1-byte unsigned integers, which is why subtracting from the maximum possible value will be the exact point operation required on each pixel to obtain the inverted image:
im = Image.open("../images/parrot.png")
im_t = im.point(lambda x: 255 - x)
im_t.show()
The image below shows the negative image, the output of the previous code:
Convert image to grayscale
We can convert an RGB color image to a grayscale image using a function with 'L'
parameters , as shown in the following code:convert()
im_g = im.convert('L') # convert the RGB color image to a grayscale image
We will use this image in the next few grayscale transformations.
some grayscale transformations
Here we explore several transformations where a function is used to convert each individual pixel value of the input image into the corresponding pixel value of the output image. function point()
is available for this. The value of each pixel is between 0 and 255 (inclusive).
log transformation
Logarithmic transformation can be used to effectively compress images with dynamic ranges of pixel values. The following code uses a point transformation to perform a logarithmic transformation. As can be seen, the range of pixel values shrinks, with lighter pixels from the input image becoming darker and darker pixels becoming brighter, thus reducing the range of pixel values:
im_g.point(lambda x: 255*np.log(1+x/255)).show()
The following image shows the output log transformation image produced by running the previous line of code:
power law transformation
This transformation is used as a gamma correction of the image. The next line of code shows how to use point()
the function to do a power law transformation, where γ = 0.6
:
im_g.point(lambda x: 255*(x/255)**0.6).show()
The image below shows the output power law transformed image produced by running the previous line of code:
some geometric transformations
In this section we discuss another set of transformations accomplished by multiplying an appropriate matrix (usually represented by homogeneous coordinates) with the image matrix. These transformations change the geometric orientation of the image, hence the name.
Reflection image
We can use transpose()
functions to reflect the image about the horizontal or vertical axis:
im.transpose(Image.FLIP_LEFT_RIGHT).show() # reflect about the vertical axis
The image below shows the output image produced by running the previous line of code:
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-8agGRlGx-1681961333561) (null)]
Rotate image
We can use rotate()
function rotation. . .
Change the pixel values of an image
We can use putpixel()
functions to change pixel values in an image. Next, let's discuss a popular application of using functions to add noise to images.
Add salt and pepper noise to images
We can add some salt and pepper noise to the image by randomly selecting a few pixels from the image and setting half of those pixel values to black and the other half to white . The next code snippet shows how to add noise:
# choose 5000 random locations inside image
im1 = im.copy() # keep the original image, create a copy
n = 5000
x, y = np.random.randint(0, im.width, n), np.random.randint(0, im.height, n)
for (x,y) in zip(x,y):
im1.putpixel((x, y), ((0,0,0) if np.random.rand() < 0.5 else (255,255,255))) # salt-and-pepper noise
im1.show()
The image below shows the output noise image generated by running the previous code:
Draw a picture
We can draw lines or other geometric shapes (for example, a function that draws an ellipse) PIL.ImageDraw
on an image from the module , as shown in the next Python code snippet:ellipse()
im = Image.open("../images/parrot.png")draw = ImageDraw.Draw(im)draw.ellipse((125, 125, 200, 250), fill=(255,255,255,128))del drawim.show()
The following image shows the output image generated by running the previous code:
Draw text on image
We can add text to images using functions PIL.ImageDraw
in the module text()
, as shown in the next Python code snippet:
draw = ImageDraw.Draw(im)
font = ImageFont.truetype("arial.ttf", 23) # use a truetype font
draw.text((10, 5), "Welcome to image processing with python", font=font)
del draw
im.show()
The following image shows the output image generated by running the previous code:
Create thumbnail
We can thumbnail()
create thumbnails from images using functions like this:
im_thumbnail = im.copy() # need to copy the original image firstim_thumbnail.thumbnail((100,100))# now paste the thumbnail on the image im.paste(im_thumbnail, (10,10))
im.save("../images/parrot_thumb.jpg")
im.show()
This figure shows the output image produced by running the previous code snippet:
Calculate basic statistics of an image
We can use stat
modules to calculate basic statistics of an image (mean, median, standard deviation, etc. of pixel values of different channels) as follows:
s = stat.Stat(im)
print(s.extrema) # maximum and minimum pixel values for each channel R, G, B
# [(4, 255), (0, 255), (0, 253)]
print(s.count)
# [154020, 154020, 154020]
print(s.mean)
# [125.41305674587716, 124.43517724970783, 68.38463186599142]
print(s.median)
# [117, 128, 63]
print(s.stddev)
# [47.56564506512579, 51.08397900881395, 39.067418896260094]
Plots a histogram of pixel values for the RGB channels of an image
histogram()
The function can be used to calculate a histogram (a table of pixel values versus frequency) of each channel pixel and return the concatenated output (for example, for an RGB image, the output contains 3 x 256=768 values):
pl = im.histogram()plt.bar(range(256), pl[:256], color='r', alpha=0.5)plt.bar(range(256), pl[256:2*256], color='g', alpha=0.4)plt.bar(range(256), pl[2*256:], color='b', alpha=0.3)plt.show()
The following image shows the R, G, and B color histogram plotted by running the previous code:
Separate RGB channels of an image
We can split()
separate the channels of a multi-channel image using functions, as shown in the following RGB image code:
ch_r, ch_g, ch_b = im.split() # split the RGB image into 3 channels: R, G and B
# we shall use matplotlib to display the channels
plt.figure(figsize=(18,6))
plt.subplot(1,3,1); plt.imshow(ch_r, cmap=plt.cm.Reds); plt.axis('off')
plt.subplot(1,3,2); plt.imshow(ch_g, cmap=plt.cm.Greens); plt.axis('off')
plt.subplot(1,3,3); plt.imshow(ch_b, cmap=plt.cm.Blues); plt.axis('off')
plt.tight_layout()
plt.show() # show the R, G, B channels
The image below shows the three output images created for each of the R (red), G (green), and B (blue) channels by running the previous code:
Combine multiple channels of an image
We can combine merge()
the channels of a multi-channel image using functions, as shown in the following code, where the color channels obtained by splitting the Parrot RGB image are merged after swapping the red and blue channels:
im = Image.merge('RGB', (ch_b, ch_g, ch_r)) # swap the red and blue channels obtained last time with split()im.show()
The following image shows the RGB output image created by running the previous code snippet to merge the B, G, and R channels:
Alpha-blending two images
blend()
The function can be used to create a new image by interpolating two given images (of the same size) using a constant α . Both images must be of the same size and mode. The output image looks like this:
out=image1 ( 1.0 -α)+ image2α
If α is 0.0, a copy of the first image is returned. If α is 1.0, a copy of the second image is returned. The next code snippet shows an example:
im1 = Image.open("../images/parrot.png")
im2 = Image.open("../images/hill.png")
# 453 340 1280 960 RGB RGBA
im1 = im1.convert('RGBA') # two images have different modes, must be converted to the same mode
im2 = im2.resize((im1.width, im1.height), Image.BILINEAR) # two images have different sizes, must be converted to the same size
im = Image.blend(im1, im2, alpha=0.5).show()
The image below shows the output image generated by blending the first two images:
Overlay two images
One image can be superimposed onto another by multiplying two input images (of the same size) pixel by pixel. The next code snippet shows an example:
im1 = Image.open("../images/parrot.png")im2 = Image.open("../images/hill.png").convert('RGB').resize((im1.width, im1.height))multiply(im1, im2).show()
The following image shows the output image produced by running the previous code snippet to overlay two images:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-eoa8auwN-1681961321680)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/9ecfd999-8b3a-4302-85ef-d164447185d5.png)]
Add two images
The next code snippet shows how to generate an image by adding two input images (of the same size) pixel by pixel:
add(im1, im2).show()
The image below shows the output image produced by running the previous code snippet:
Calculate the difference between two images
The following code returns the absolute value of the pixel-by-pixel difference between images. Image difference can be used to detect changes between two images. For example, the next code block shows how to calculate a difference image from two consecutive frames of a video recording of a 2018 FIFA World Cup match (from YouTube):
from PIL.ImageChops import subtract, multiply, screen, difference, addim1 = Image.open("../images/goal1.png") # load two consecutive frame images from the videoim2 = Image.open("../images/goal2.png")im = difference(im1, im2)im.save("../images/goal_diff.png")plt.subplot(311)plt.imshow(im1)plt.axis('off')plt.subplot(312)plt.imshow(im2)plt.axis('off')plt.subplot(313) ...
Subtract two images and superimpose two image negatives
subtract()
Function can be used to first subtract two images, then divide the result by the scale (default is 1.0) and add an offset (default is 0.0). Similarly, screen()
functionality can be used to superimpose two inverted images on top of each other.
Image processing using scikit-image
As we did earlier with the PIL library, we can also use scikit-image
library functions for image processing. The following sections show some examples.
Use the warp() function to perform reverse warping and geometric transformations
The functions of the scikit image transform
module warp()
can be used to inverse warp the geometric transformation of an image (discussed in the previous section), as shown in the following example.
Apply an affine transformation to an image
We can use SimilarityTransform()
a function to calculate the transformation matrix and then use warp()
the function to transform, as shown in the next code block:
im = imread("../images/parrot.png")
tform = SimilarityTransform(scale=0.9, rotation=np.pi/4,translation=(im.shape[0]/2, -100))
warped = warp(im, tform)
import matplotlib.pyplot as plt
plt.imshow(warped), plt.axis('off'), plt.show()
The image below shows the output image produced by running the previous code snippet:
Apply swirl transform
This is a nonlinear transformation defined in the scikit image documentation. The next code snippet shows how to swirl()
implement the transformation using a function, where strength
are the parameters of the function, swirl
the quantity represented, the range of pixels radius
represented , and the rotation angle added. The conversion is to ensure that the conversion decays to ≈ one thousandth ≈ 1/1000 within the specified radius:swirl
rotation
radius
r
im = imread("../images/parrot.png")swirled = swirl(im, rotation=0, strength=15, radius=200)plt.imshow(swirled)plt.axis('off')plt.show()
The next image shows the output image generated by the swirl transformation by running the previous code snippet:
Add random Gaussian noise to image
We can use random_noise()
functions to add different types of noise to images. The next code example shows how to add Gaussian noise with different variances to an image:
im = img_as_float(imread("../images/parrot.png"))
plt.figure(figsize=(15,12))
sigmas = [0.1, 0.25, 0.5, 1]
for i in range(4):
noisy = random_noise(im, var=sigmas[i]**2)
plt.subplot(2,2,i+1)
plt.imshow(noisy)
plt.axis('off')
plt.title('Gaussian noise with sigma=' + str(sigmas[i]), size=20)
plt.tight_layout()
plt.show()
The next figure shows the output image produced by running the previous code snippet adding Gaussian noise with different variances. It can be seen that the greater the standard deviation of Gaussian noise, the greater the noise of the output image:
Calculate the cumulative distribution function of an image
We can use cumulative_distribution()
the function to calculate the cumulative distribution function ( CDF) of a given image , as we will see in the image enhancement chapter. For now, we encourage readers to use this function to calculate the CDF.
Image processing with Matplotlib
We can use modules matplotlib
in the library pylab
for image processing. An example is shown in the next section.
*# Draw contour lines for the image
The contour of an image is a curve that connects all pixels that have the same specific value. The following code block shows how to draw outlines and filled outlines of an Einstein grayscale image:
im = rgb2gray(imread("../images/einstein.jpg")) # read the image from disk as a numpy ndarrayplt.figure(figsize=(20,8))plt.subplot(131), plt.imshow(im, cmap='gray'), plt.title('Original Image', size=20) plt.subplot(132), plt.contour(np.flipud(im), colors='k', levels=np.logspace(-15, 15, 100))plt.title('Image Contour Lines', size=20)plt.subplot(133), plt.title('Image Filled Contour', size=20), plt.contourf(np.flipud(im), cmap='inferno')plt.show()
The next figure shows this. . .
Image processing using scipy.misc and scipy.ndimage modules
We can also use the and modules scipy
from the library for image processing; this is left as an exercise for the reader to find the relevant functions and become familiar with their usage.misc
ndimage
Summarize
在本章中,我们首先提供了图像处理的基本介绍,以及关于我们试图在图像处理中解决的问题的基本概念。然后,我们讨论了图像处理的不同任务和步骤,以及 Python 中的主要图像处理库,我们将在本书中使用这些库进行编码。接下来,我们讨论了如何在 Python 中安装用于图像处理的不同库,以及如何导入它们并从模块中调用函数。我们还介绍了有关图像类型、文件格式和数据结构的基本概念,以使用不同的 Python 库存储图像数据。然后,我们讨论了如何使用不同的库在 Python 中执行图像 I/O 和显示。最后,我们讨论了如何。。。
问题
- 使用
scikit-image
库的函数读取图像集合并将其显示为蒙太奇。 - 使用
scipy ndimage
和misc
模块的功能对图像进行缩放、裁剪、调整大小和应用仿射变换。 - 创建 Gotham Instagram 过滤器的 Python 翻拍版(https://github.com/lukexyz/CV-Instagram-Filters (提示:使用 PIL
split()
、merge()
和numpy interp()
功能操作图像以创建通道插值(https://www.youtube.com/watch?v=otLGDpBglEA &功能=播放器(嵌入)。 - 使用 scikit image 的
warp()
功能实现漩涡变换。注意,swirl
变换也可以用以下等式表示:
- 执行以下给出的波形变换(提示:使用 scikit 图像的
warp()
:
- Use PIL to load an RGB
.png
file with a color palette and convert it to a grayscale image. This question is taken from this post: https://stackoverflow.com/questions/51676447/python-use-pil-to-load-png-file-gives-strange-results/51678271#51678271 .VOC2012
Convert the following RGB image (from the dataset) to grayscale by indexing the palette :
-
Plot a 3D plot for each color channel of the parrot image used in this chapter (hint: use the power
mpl_toolkits.mplot3d
of modulesplot_surface()
and the power of NumPymeshgrid()
). -
Use the To.T0 module of SCIKIT Image's To.T0 module to estimate the homography matrix from the source to the destination image, and use the AUT2 T2 function to embed the image in a blank canvas (as shown below):
| Input image | Output image |
| [External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-60AbU86m-1681961333418) (null)] | |
First, try to solve the problem yourself. The solution can be found here for your reference: https://sandipanweb.wordpress.com/2018/07/30/some-image-processing-problems/ .
further reading
-
Digital Image Processing , a book on image processing concepts by Rafael C.Gonzalez and Richard E.Woods
-
Class lecture notes/handouts for this course ( https://web.stanford.edu/class/ee368/handouts.html Stanford University course and this book ( https://ocw.mit.edu/resources/res-2-006- girls-who-build-cameras-summer-2016/ MIT One
-
https://docs.scipy.org/doc/scipy-1.1.0/reference/ndimage.html
-
https://web.cs.wpi.edu/~emmanuel/courses/cs545/S14/slides/instructor09.pdf. . . ****
2. Sampling, Fourier transform and convolution
In this chapter, we will discuss two-dimensional signals in the time and frequency domains. We'll first discuss spatial sampling, an important concept for resizing images, and the challenges in sampling. We will try to solve these problems using functions in the Python library. We will also introduce intensity quantization in images; intensity quantization means how many bits will be used to store pixels in the image, and the impact it will have on image quality. You'll definitely want to know about the Discrete Fourier Transform ( DFT ) that can be used to convert an image from the spatial (time) domain to the frequency domain . You will learn to implement DFT using the Fast Fourier Transform ( FFT) algorithm using numpy
the and functions , and will be able to apply this implementation on images!scipy
You will also be interested in learning about 2D convolutions to increase convolution speed. We will also look at the basic concepts of the convolution theorem. We will try to clarify the age-old confusion between correlation and convolution with an example. Additionally, we will describe an example from SciPy that will show you how to use templates to find the location of a specific pattern in an image by applying cross-correlation.
We'll also introduce some filtering techniques and see how to implement them using Python libraries. You will be interested to see the results we get when we use these filters to denoise an image.
The topics we will cover in this chapter are as follows:
- Image formation – sampling and quantization
- Discrete Fourier Transform
- Understanding convolution
Image formation – sampling and quantization
In this section, we will describe two important concepts of image formation, namely sampling and quantization, and see how to resize images using quantization of sampling and color using the PIL
and libraries. scikit-image
We're going to use a hands-on approach here, and we'll define these concepts while seeing them in action. prepared
Let's start by importing all required packages:
% matplotlib inline # for inline image display inside notebookfrom PIL import Imagefrom skimage.io import imread, imshow, showimport scipy.fftpack as fpfrom scipy import ndimage, misc, signalfrom scipy.stats import signaltonoisefrom skimage import data, img_as_floatfrom skimage.color import rgb2grayfrom skimage.transform import ...
sampling
Sampling refers to selecting/rejecting image pixels, which means it is a spatial operation. We can use sampling to increase or decrease the size of an image, using upsampling and downsampling respectively. In the next few sections we will discuss different sampling techniques with examples.
upsampling
As discussed briefly in Chapter 1 , starting with image processing , in order to increase the size of an image, we need to upsample the image. The challenge is that the new larger image will have some pixels that have no corresponding pixels in the original smaller image and we need to guess these unknown pixel values. We can guess the value of the unknown pixel using the following formula:
- an aggregation, e.g., the average of one or more known pixel neighbor values of its nearest
- Interpolation using pixel neighborhoods for bilinear or cubic interpolation
Nearest neighbor based upsampling may result in poor output image quality. Let's write code to verify this:
im = Image.open("../images/clock.jpg") # the original small ...
Upsampling and interpolation
In order to improve the image quality of the upsampled output, some interpolation methods can be used, such as bilinear interpolation or bicubic interpolation. Let's see how.
bilinear interpolation
Let's consider a grayscale image, which is basically a two-dimensional matrix of pixel values at integer grid positions. To interpolate pixel values at any point P on the grid , you can use the 2D analog of linear interpolation: bilinear interpolation. In this case, for each possible point P (which we want to interpolate), the intensity values of the four adjacent points (i.e. Q 11 , Q 12 , Q 22 and Q 21 ) will be combined , to calculate the interpolated intensity at point P, as shown in the figure below:
Let's use the PIL resize()
function for bilinear interpolation:
im1 = im.resize((im.width*5, im.height*5), Image.BILINEAR) # up-sample with bi-linear interpolation
pylab.figure(figsize=(10,10)), pylab.imshow(im1), pylab.show()
This is the resized image. Note how the quality improves when bilinear interpolation is used with upsampling:
bicubic interpolation
It is an extension of cubic interpolation and is used to interpolate data points on a two-dimensional regular grid. The interpolated surface is smoother than the corresponding surface obtained by bilinear interpolation or nearest neighbor interpolation.
Bicubic interpolation can be done using Lagrangian polynomials, cubic splines, or cubic convolution algorithms. PIL uses cubic spline interpolation in a 4x4 environment.
Let's use the PIL resize()
function for bicubic interpolation:
im.resize((im.width*10, im.height*10), Image.BICUBIC).show() # bi-cubic interpolation
pylab.figure(figsize=(10,10)), pylab.imshow(im1), pylab.show()
See how the quality of the resized image improves when we use bicubic interpolation:
Downsampling
In order to reduce the size of the image, we need to downsample the image. For every pixel in the new smaller image, there will be multiple pixels in the original larger image. We can calculate the value of a pixel in the new image by doing the following:
- Remove some pixels from a larger image in a systematic way (e.g. if we want the image to be a quarter of the size of the original image, remove one pixel from every other row and column)
- Calculate the new pixel value as the aggregate value of the corresponding multiple pixels in the original image
Let's take tajmahal.jpg
an image and use resize()
a function to resize it to an output image that is 25 times smaller than the input image, also from the PIL library:
im = Image.open("../images/tajmahal.jpg") ...
Downsampling and anti-aliasing
As we can see, downsampling is not great for shrinking images because it creates a aliasing effect. For example, if we try to resize the original image by reducing the width and height by a factor of 5 (downsampling), we will get such an incomplete and bad output.
Anti-aliasing
The problem here is that a single pixel in the output image corresponds to 25 pixels in the input image, but we sample the value of a single pixel. We should average over a small area of the input image. This can be achieved using ANTIALIAS
(High Quality Downsampling Filter); this is what you can do:
im = im.resize((im.width//5, im.height//5), Image.ANTIALIAS)
pylab.figure(figsize=(15,10)), pylab.imshow(im), pylab.show()
Create an image using PIL with anti-aliasing, same as the previous image but with better quality (hardly any artifacts/aliasing effects):
Anti-aliasing is usually done by smoothing the image before downsampling (by convolving the image with a low-pass filter such as a Gaussian filter)
Now let us use the anti-aliasing functionality transform
of scikit-image module to overcome the aliasing problem of another image, i.e. image:rescale()
umbc.png
im = imread('../images/umbc.png')
im1 = im.copy()
pylab.figure(figsize=(20,15))
for i in range(4):
pylab.subplot(2,2,i+1), pylab.imshow(im1, cmap='gray'), pylab.axis('off')
pylab.title('image size = ' + str(im1.shape[1]) + 'x' + str(im1.shape[0]))
im1 = rescale(im1, scale = 0.5, multichannel=True, anti_aliasing=False)
pylab.subplots_adjust(wspace=0.1, hspace=0.1)
pylab.show()
The next screenshot shows the output of the previous code. As shown, the image is downsampled to create smaller and smaller outputs. When anti-aliasing technology is not used, the aliasing effect becomes more prominent:
Let's change the line of code to use anti-aliasing:
im1 = rescale(im1, scale = 0.5, multichannel=True, anti_aliasing=True)
This produces better quality images:
To learn more about interpolation and anti-aliasing, please visit my blog: https://sandipanweb.wordpress.com/2018/01/21/recursive-graphics-bilinear-interpolation-and-image-transformation-in-Python/.
Quantify
Quantization is related to the intensity of the image and can be defined by the number of bits used per pixel. Digital images are usually quantized to 256 gray levels. Here we will see that as the number of bits used for pixel storage decreases, the quantization error increases, leading to artificial borders or contours and pixelation, and resulting in poor image quality.
PIL quantization
Let's use the PIL Image
module's convert()
functions for color quantization, P
with mode and color parameters as the maximum possible number of colors. We will also use functions from the SciPy stats
module signaltonoise()
to find the signal- to-noise ratio ( SNRparrot.jpg
) of an image , which is defined as the standard deviation of the image array divided by the mean:
im = Image.open('../images/parrot.jpg')
pylab.figure(figsize=(20,30))
num_colors_list = [1 << n for n in range(8,0,-1)]
snr_list = []
i = 1
for num_colors in num_colors_list:
im1 = im.convert('P', palette=Image.ADAPTIVE, colors=num_colors)
pylab.subplot(4,2,i), pylab.imshow(im1), pylab.axis('off')
snr_list.append(signaltonoise(im1, axis=None))
pylab.title('Image with # colors = ' + str(num_colors) + ' SNR = ' +
str(np.round(snr_list[i-1],3)), size=20)
i += 1
pylab.subplots_adjust(wspace=0.2, hspace=0)
pylab.show()
This shows how image quality degrades with color quantization when the number of bits to store pixels decreases:
Framework two is as follows:
Now we will plot the effect of color quantization on image signal-to-noise ratio. Signal-to-noise ratio is usually a measure of image quality. The higher the signal-to-noise ratio, the better the quality:
pylab.plot(num_colors_list, snr_list, 'r.-')
pylab.xlabel('# colors in the image')
pylab.ylabel('SNR')
pylab.title('Change in SNR w.r.t. # colors')
pylab.xscale('log', basex=2)
pylab.gca().invert_xaxis()
pylab.show()
It can be seen that although color quantization reduces the image size (because the number of bits/pixels is reduced), it also makes the image quality worse, as measured by SNR:
Discrete Fourier Transform
The Fourier transform method has a long mathematical history and we are not going to discuss it here (it can be found in any digital signal processing or digital image processing theory book). As far as image processing is concerned, we will only focus on the 2D Discrete Fourier Transform ( DFT ). The basic idea behind the Fourier Transform method is that the image can be thought of as a 2D function, f, , which can be expressed as sine and cosine Weighted sum (Fourier basis functions) along two dimensions.
We can use DFT to convert from a set of grayscale pixel values in an image (space/time domain) to a set of Fourier coefficients (frequency domain), and it is discrete because of the spatial and temporal transformations. . .
Why do we need DFT?
First, transforming to the frequency domain allows for a better understanding of the image. As we will see in the next few sections, low frequencies in the frequency domain correspond to the average overall level of information in the image, while high frequencies correspond to edges, noise, and more detailed information.
Typically, images are smooth in nature, which is why most images can be represented using a small number of DFT coefficients, while all remaining higher coefficients are almost negligible/zero.
This is very useful in image compression, especially for Fourier sparse images, where only a few Fourier coefficients are needed to reconstruct the image, so only these frequencies can be stored, while others can be discarded, allowing high compression (e.g., in JPEG In image compression algorithms, a similar transform is used, the discrete cosine transform ( DCT ). Furthermore, as we will see later in this chapter, filtering with a DFT in the frequency domain can be much faster than filtering in the spatial domain.
Fast Fourier Transform algorithm for calculating DFT
Fast Fourier Transform ( FFT ) is a divide-and-conquer algorithm for recursively computing DFT, which is much faster (O(N.log 2 N) time complexity) than the much slower O(N2) original for nxn images The calculation is much faster. In Python, both the numpy
and scipy
library provide functions for calculating 2D DFT/IDFT using the FFT algorithm. Let's look at a few examples.
*# FFT with scipy.fftpack module
We will use the /scipy.fftpack
function of the module to calculate the DFT/IDFT by using the FFT algorithm of a grayscale image:fft2()
ifft2()
rhino.jpg
im = np.array(Image.open('../images/rhino.jpg').convert('L')) # we shall work with grayscale image
snr = signaltonoise(im, axis=None)
print('SNR for the original image = ' + str(snr))
# SNR for the original image = 2.023722773801701
# now call FFT and IFFT
freq = fp.fft2(im)
im1 = fp.ifft2(freq).real
snr = signaltonoise(im1, axis=None)
print('SNR for the image obtained after reconstruction = ' + str(snr))
# SNR for the image obtained after reconstruction = 2.0237227738013224
assert(np.allclose(im, im1)) # make sure the forward and inverse FFT are close to each other
pylab.figure(figsize=(20,10))
pylab.subplot(121), pylab.imshow(im, cmap='gray'), pylab.axis('off')
pylab.title('Original Image', size=20)
pylab.subplot(122), pylab.imshow(im1, cmap='gray'), pylab.axis('off')
pylab.title('Image obtained after reconstruction', size=20)
pylab.show()
Here is the output:
It can be seen from the SNR value of the inline output and the visual difference between the input and reconstructed images that the reconstructed image loses some information. The difference is negligible if we use all coefficients obtained for reconstruction
Plot a spectrogram
Since the Fourier coefficients are complex numbers, we can directly observe the magnitude. The magnitude that displays the Fourier transform is called the spectrum of the transform. The value F[0,0] of DFT is called the DC coefficient. ****
The DC coefficient is too large to see the other coefficient values, which is why we need to stretch the transformed values by displaying the logarithm of the transform. Furthermore, for ease of display, the transform coefficients are shifted (used fftshift()
) so that the DC component is at the center. Excited about creating Fourier spectroscopy of rhino images? The encoding is as follows:
# the quadrants are needed to be shifted around in order that the low spatial frequencies are in the center of the 2D fourier-transformed ...
FFT with numpy.FFT module
The DFT of an image can be calculated using numpy.fft
the module's similar set of functions. We'll see some examples.
Calculate the amplitude and phase of the DFT
We will use house.png
an image as input to fft2()
get the real and imaginary parts of the Fourier coefficients; after that we will calculate the amplitude/spectrum and phase and finally ifft2()
reconstruct the image using:
import numpy.fft as fpim1 = rgb2gray(imread('../images/house.png'))pylab.figure(figsize=(12,10))freq1 = fp.fft2(im1)im1_ = fp.ifft2(freq1).realpylab.subplot(2,2,1), pylab.imshow(im1, cmap='gray'), pylab.title('Original Image', size=20)pylab.subplot(2,2,2), pylab.imshow(20*np.log10( 0.01 + np.abs(fp.fftshift(freq1))), cmap='gray')pylab.title('FFT Spectrum Magnitude', size=20)pylab.subplot(2,2,3), pylab.imshow(np.angle(fp.fftshift(freq1)), cmap='gray')pylab.title('FFT ...
Understanding convolution
Convolution is an operation that operates on two images, one an input image and the other a mask (also called a kernel ) that acts as a filter on the input image to produce an output image
Convolutional filtering is used to modify the spatial frequency characteristics of the image. It works by calculating the new value of a pixel in the output image by adding the weighted values of all adjacent pixels to determine the value of the center pixel. The pixel values in the output image are calculated by traversing the kernel window in the input image, as shown in the next screenshot (for convolution in valid mode; we will see convolution mode later in this chapter):
As you can see, the kernel window (marked by the arrow in the input image) traverses the image and after convolution gets the values mapped onto the output image.
Why convolve images?
Convolution applies a general filtering effect to the input image. This is done to achieve various effects on the image using appropriate kernels, such as smoothing, sharpening, and embossing, as well as in operations such as edge detection.
SciPy signal convolution convolution
Functions of the SciPy signals module convolve2d()
are available for correlation. We will use this function to apply convolution to an image with a kernel.
Apply convolution to grayscale image
Let's first cameraman.jpg
detect edges from a grayscale image using convolution and Laplacian kernel, and box
blur the image using the kernel:
im = rgb2gray(imread('../image s/cameraman.jpg')).astype(float)print(np.max(im))# 1.0print(im.shape)# (225, 225)blur_box_kernel = np.ones((3,3)) / 9edge_laplace_kernel = np.array([[0,1,0],[1,-4,1],[0,1,0]])im_blurred = signal.convolve2d(im, blur_box_kernel)im_edges = np.clip(signal.convolve2d(im, edge_laplace_kernel), 0, 1)fig, axes = pylab.subplots(ncols=3, sharex=True, sharey=True, figsize=(18, 6))axes[0].imshow(im, cmap=pylab.cm.gray)axes[0].set_title('Original Image', size=20)axes[1].imshow(im_blurred, cmap=pylab.cm.gray)axes[1].set_title('Box Blur', ...
Convolution mode, pad values and boundary conditions
Depending on what you want to do with the edge pixels, there are three parameters: mode
, , boundary
and fillvalue
, that can be passed to the SciPy convolve2d()
function. Here we will briefly discuss mode
the arguments:
mode='full'
:Default mode, the output is a fully discrete linear convolution of the inputmode='valid'
: Ignore edge pixels and only count all adjacent pixels (pixels that do not require zero padding). The output image size is smaller than the input image size for all kernels (except 1 x 1)mode='same'
: The output image is the same size as the input image; it is centered relative to'full'
the output.
Apply convolution to color (RGB) images
Using scipy.convolve2d()
, we can also sharpen RGB images. We have to apply convolution to each image channel separately.
Let's use an image with a emboss
composite kernel of kernel and schar edge detection tajmahal.jpg
:
im = misc.imread('../images/tajmahal.jpg')/255 # scale each pixel value in [0,1]print(np.max(im))print(im.shape)emboss_kernel = np.array([[-2,-1,0],[-1,1,1],[0,1,2]])edge_schar_kernel = np.array([[ -3-3j, 0-10j, +3 -3j], [-10+0j, 0+ 0j, +10+0j], [ -3+3j, 0+10j, +3 +3j]])im_embossed = np.ones(im.shape)im_edges = np.ones(im.shape)for i in range(3): im_embossed[...,i] = np.clip(signal.convolve2d(im[...,i], emboss_kernel, mode='same', boundary="symm"),0,1)for i in range(3): ...
Convolution with SciPy ndimage.COLVEL
Using scipy.ndimage.convolve()
, we can directly sharpen the RGB image (we don't have to apply convolution to each image channel separately).
Use victoria_memorial.png
the image with sharpen
kernel and emboss
core:
im = misc.imread('../images/victoria_memorial.png').astype(np.float) # read as float
print(np.max(im))
sharpen_kernel = np.array([0, -1, 0, -1, 5, -1, 0, -1, 0]).reshape((3, 3, 1))
emboss_kernel = np.array(np.array([[-2,-1,0],[-1,1,1],[0,1,2]])).reshape((3, 3, 1))
im_sharp = ndimage.convolve(im, sharpen_kernel, mode='nearest')
im_sharp = np.clip(im_sharp, 0, 255).astype(np.uint8) # clip (0 to 255) and convert to unsigned int
im_emboss = ndimage.convolve(im, emboss_kernel, mode='nearest')
im_emboss = np.clip(im_emboss, 0, 255).astype(np.uint8)
pylab.figure(figsize=(10,15))
pylab.subplot(311), pylab.imshow(im.astype(np.uint8)), pylab.axis('off')
pylab.title('Original Image', size=25)
pylab.subplot(312), pylab.imshow(im_sharp), pylab.axis('off')
pylab.title('Sharpened Image', size=25)
pylab.subplot(313), pylab.imshow(im_emboss), pylab.axis('off')
pylab.title('Embossed Image', size=25)
pylab.tight_layout()
pylab.show()
You will get these convolved images:
The sharpened image looks like this:
The relief image looks like this:
Correlation and convolution
Correlation is very similar to the convolution operation in that it also takes an input image and another kernel and traverses the kernel window through the input by computing a weighted combination of pixel neighborhood values and kernel values, and produces an output image.
The only difference is that, unlike correlation, convolution flips the kernel twice (with respect to the horizontal and vertical axes) before calculating the weighted combination.
The next diagram mathematically describes the difference between correlation and convolution on an image:
SciPy signals modulecorrelated2d() ...
Template matching based on cross-correlation between images and templates
In this example, we will use cross-correlation of the eye template image (using the kernel of the image for cross-correlation), and the position of the eyes in the raccoon face image will look like this:
face_image = misc.face(gray=True) - misc.face(gray=True).mean()
template_image = np.copy(face_image[300:365, 670:750]) # right eye
template_image -= template_image.mean()
face_image = face_image + np.random.randn(*face_image.shape) * 50 # add random noise
correlation = signal.correlate2d(face_image, template_image, boundary='symm', mode='same')
y, x = np.unravel_index(np.argmax(correlation), correlation.shape) # find the match
fig, (ax_original, ax_template, ax_correlation) = pylab.subplots(3, 1, figsize=(6, 15))
ax_original.imshow(face_image, cmap='gray')
ax_original.set_title('Original', size=20)
ax_original.set_axis_off()
ax_template.imshow(template_image, cmap='gray')
ax_template.set_title('Template', size=20)
ax_template.set_axis_off()
ax_correlation.imshow(correlation, cmap='afmhot')
ax_correlation.set_title('Cross-correlation', size=20)
ax_correlation.set_axis_off()
ax_original.plot(x, y, 'ro')
fig.show()
You've used red dots to mark the locations with the largest cross-correlation values (best matches to the template):
Here is the template:
Applying cross-correlation results in the following output:
As can be seen from the previous image, one of the raccoon's eyes in the input image has the highest correlation with the eye template image.
Summarize
We discussed some important concepts mainly related to 2D DFT and its related applications in image processing, such as frequency domain filtering, with extensive examples using scikit-image numpy.fft
, scipy.fftpack
, signal
and modules.ndimage
Hopefully you are now clear about sampling and quantization, two important image forming techniques. We have seen Python implementations of 2D DFT and FFT algorithms, as well as image denoising and restoration, correlation and convolution of DFT in image processing, the application of convolution in filter design, and the application of correlation in template matching.
You should now be able to write Python code to execute. . .
question
Questions are as follows:
-
Use Gaussian LPF to achieve downsampling and anti-aliasing (Tip: apply Gaussian filter first, then filter every other row and column to reduce the house grayscale image four times. Before downsampling, compare the output with and without LPF preprocessing image)
-
Upsample the image using FFT: first
lena
double the size of the grayscale image by filling zero rows/columns at every alternating position, then use FFT, then LPF, then IFFT to obtain the output image. Why does it work? -
Try applying Fourier transform and image reconstruction with color (RGB) images. (Tip: Apply FFT to each channel separately).
-
Use mathematical methods and 2D kernel examples to illustrate that the Fourier transform of a Gaussian kernel is another Gaussian kernel.
-
Generate images with correlation and convolution using
lena
images and asymmetric ripple kernels. Display output images are different. Now, flip the kernel twice (upside down and left and right) and apply the correlation with the flipped kernel. Whether the output image is the same as the image obtained by convolution with the original kernel
further reading
Here are various references from various sources:
- Class lecture notes from http://fy.chalmers.se/~romeo/RRY025/notes/E1.pdf and http://web.pdx.edu/~jduh/courses/Archive/geog481w07/Students/Ludwig_ImageConvolution.pdf
- These slides ( https://web.cs.wpi.edu/~emmanuel/courses/cs545/S14/slides/lecturer10.pdf Author: Professor emmanuel Agu
- Oxford University Lectures: http://www.robots.ox.ac.uk/~az/touchts/ia/lect2.pdf *
3. Convolution and frequency domain filtering
In this chapter, we continue our discussion of 2D convolutions and see how to do convolutions faster in the frequency domain (using the basic concepts of the convolution theorem). We will understand the basic difference between correlation and convolution* with an example on an image . We will also describe an example from SciPy that will show how to use cross-correlation to find the location of a specific pattern in an image with a template image. Finally, we will describe several filtering techniques in the frequency domain (which can be implemented using *, kernel convolutions, such as box kernels or Gaussian kernels), such as high-pass, low-pass, band-pass and band-stop filters, and how to use them with examples Python libraries implement them. We will give examples of how some filters can be used for image denoising (e.g., band-reject
or notch
filters to remove periodic noise in images, or inverse or Wiener filters to remove Gaussian/motion blur kernel blurred images).
**The topics covered in this chapter are as follows:
- Convolution theorem and frequency domain Gaussian blur
- Frequency domain filtering (using SciPy
ndimage
module sumscikit-image
)
Convolution theorem and frequency domain Gaussian blur
In this section, we will see more applications of convolving images using Python modules such as scipy signal
and . ndimage
Let's start with the convolution theorem and see how the convolution operation becomes easier in the frequency domain.
Application of the convolution theorem
The convolution theorem states that convolution in the image domain is equivalent to simple multiplication in the frequency domain:
**
The following figure shows the application of Fourier transform:
The figure below shows the basic steps of frequency domain filtering. We take as input the original image F*,* and the kernel (mask or downgrade/enhancement function). First, the two inputs need to be converted to the frequency domain using DFT, and then convolution is applied, which according to the convolution theorem is just (element-wise) multiplication. This will output the convolved image in the frequency domain, on which we need to apply IDFT to obtain the reconstructed image (with some degradation or enhancement on the original image):
Now let's see a demonstration of the theorem on some images and some Python library functions. We need to import all required libraries like in the previous chapter.
Frequency domain Gaussian blur filter based on numpy-fft
The following code block shows how to use the convolution theorem and numpy fft
apply a Gaussian filter in the frequency domain (since in the frequency domain it is just multiplication):
pylab.figure(figsize=(20,15))pylab.gray() # show the filtered result in grayscaleim = np.mean(imread('../images/lena.jpg'), axis=2)gauss_kernel = np.outer(signal.gaussian(im.shape[0], 5), signal.gaussian(im.shape[1], 5))freq = fp.fft2(im)assert(freq.shape == gauss_kernel.shape)freq_kernel = fp.fft2(fp.ifftshift(gauss_kernel))convolved = freq*freq_kernel # by the convolution theorem, simply multiply in the frequency domainim1 = fp.ifft2(convolved).realpylab.subplot(2,3,1), pylab.imshow(im), pylab.title('Original ...
Gaussian kernel in frequency domain
In this section we will see what a Gaussian kernel looks like in the frequency domain in 2D and 3D plots.
Gaussian LPF kernel spectrum in two dimensions
The next code block shows how to log
plot the spectrum of a 2D Gaussian kernel using a transformation:
im = rgb2gray(imread('../images/lena.jpg'))
gauss_kernel = np.outer(signal.gaussian(im.shape[0], 1), signal.gaussian(im.shape[1], 1))
freq = fp.fft2(im)
freq_kernel = fp.fft2(fp.ifftshift(gauss_kernel))
pylab.imshow( (20*np.log10( 0.01 + fp.fftshift(freq_kernel))).real.astype(int), cmap='coolwarm') # 0.01 is added to keep the argument to log function always positive
pylab.colorbar()
pylab.show()
The screenshot below shows the output of the previous code, with a color bar. Since the Gaussian kernel is a low-pass filter, its spectrum has higher values for the center frequency (it allows more low-frequency values), and gradually decreases as you move away from the center to higher frequency values:
The next screenshot shows the spectrum of a three-dimensional Gaussian kernel along the response axis, with and without logarithmic scale. It can be seen that the DFT of the Gaussian kernel is another Gaussian kernel. The Python code for three-dimensional plotting is left as an exercise to the reader (question 3, with hints).
Gaussian LPF kernel spectrum in 3D
The horizontal plane represents the frequency plane and the vertical axis of the Gaussian kernel response in the frequency domain, without and with the logarithmic axis:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-rMvCv8la-1681961321688)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/dbb818fc-845c-4dc5-b7e8-09d2716b7ebd.png)]
Frequency domain Gaussian blur filter with scipy signal
The following code block shows how to fftconvolve()
run a convolution in the frequency domain using functions from the SciPy Signal module (internally only via the multiplication and convolution theorems):
im = np.mean(misc.imread('../images/mandrill.jpg'), axis=2) print(im.shape)# (224, 225)gauss_kernel = np.outer(signal.gaussian(11, 3), signal.gaussian(11, 3)) # 2D Gaussian kernel of size 11x11 with σ = 3im_blurred = signal.fftconvolve(im, gauss_kernel, mode='same')fig, (ax_original, ax_kernel, ax_blurred) = pylab.subplots(1, 3, figsize=(20,8))ax_original.imshow(im, cmap='gray')ax_original.set_title('Original', size=20)ax_original.set_axis_off()ax_kernel.imshow(gauss_kernel) ...
Comparing runtimes of SciPy convolve() and fftconvolve() with Gaussian blur kernel
We can use the Python timeit
module to compare the runtime of image domain and frequency domain convolution functions. Since frequency domain convolution involves a single matrix multiplication rather than a series of sliding window arithmetic calculations, it is expected to be much faster. The following code compares runtime:
im = np.mean(misc.imread('../images/mandrill.jpg'), axis=2)
print(im.shape)
# (224, 225)
gauss_kernel = np.outer(signal.gaussian(11, 3), signal.gaussian(11, 3)) # 2D Gaussian kernel of size 11x11 with σ = 3
im_blurred1 = signal.convolve(im, gauss_kernel, mode="same")
im_blurred2 = signal.fftconvolve(im, gauss_kernel, mode='same')
def wrapper_convolve(func):
def wrapped_convolve():
return func(im, gauss_kernel, mode="same")
return wrapped_convolve
wrapped_convolve = wrapper_convolve(signal.convolve)
wrapped_fftconvolve = wrapper_convolve(signal.fftconvolve)
times1 = timeit.repeat(wrapped_convolve, number=1, repeat=100)
times2 = timeit.repeat(wrapped_fftconvolve, number=1, repeat=100)
The following code block displays the original Mandrill image and the blurred image using these two functions:
pylab.figure(figsize=(15,5))
pylab.gray()
pylab.subplot(131), pylab.imshow(im), pylab.title('Original Image',size=15), pylab.axis('off')
pylab.subplot(132), pylab.imshow(im_blurred1), pylab.title('convolve Output', size=15), pylab.axis('off')
pylab.subplot(133), pylab.imshow(im_blurred2), pylab.title('ffconvolve Output', size=15),pylab.axis('off')
The screenshot below shows the output of the previous code. As expected, the convolve()
and fftconvolve()
functions both produce the same blurry output image:
The code below visualizes the difference between runtimes. Each function has been run 100 times on the same input image with the same Gaussian kernel, and then a boxplot of the time taken by each function is plotted:
data = [times1, times2]
pylab.figure(figsize=(8,6))
box = pylab.boxplot(data, patch_artist=True) #notch=True,
colors = ['cyan', 'pink']
for patch, color in zip(box['boxes'], colors):
patch.set_facecolor(color)
pylab.xticks(np.arange(3), ('', 'convolve', 'fftconvolve'), size=15)
pylab.yticks(fontsize=15)
pylab.xlabel('scipy.signal convolution methods', size=15)
pylab.ylabel('time taken to run', size = 15)
pylab.show()
The screenshot below shows the output of the previous code. As can be seen, fftconvolve()
the average runs faster:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-G79G1rP9-1681961321688)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/8472af49-4dba-4ed8-97eb-9f853ad28a9c.png)]
Frequency domain filtering (HPF, LPF, BPF and notch filters)
If we remember that in the image processing pipeline described in Chapter 1 , starting with image processing , the next step after image acquisition is image preprocessing. Images are often corrupted by random changes in brightness and lighting, or have poor contrast, making them unusable and requiring enhancement. This is where filters are used.
What is a filter?
Filtering refers to transforming pixel intensity values to reveal certain image characteristics, such as:
- Enhancement : This image feature increases contrast
- Smoothing : This image feature removes noise
- Template matching : This image feature detects known patterns
The filtered image is described by a discrete convolution, and the filter is described by an nxn discrete convolution mask.
High Pass Filter (HPF)
This filter only allows high frequencies from the frequency domain representation of the image (obtained via DFT) and blocks all low frequencies beyond the cutoff value. The image is reconstructed using inverse DFT, and since high-frequency components correspond to edges, details, noise, etc., HPFs tend to extract or enhance them. The next few sections will demonstrate how to implement HPF using numpy
, scipy
and scikit-image
different functions in the library and the impact of HPF on images.
We can implement HPF on images by following these steps:
scipy.fftpack fft2
Perform a 2D FFT using- Only high frequency components are retained (removed...
Signal-to-noise ratio as a function of frequency cutoff
The code block below shows how to plot the signal-to-noise ratio * ( change in signal-to-noise ratio :*) with the cutoff frequency (F) of the HPF
pylab.plot(lbs, snrs_hp, 'b.-')
pylab.xlabel('Cutoff Frequency for HPF', size=20)
pylab.ylabel('SNR', size=20)
pylab.show()
The following screenshot shows how the SNR of the output image decreases as the HPF cutoff frequency increases:
Low pass filter (LPF)
This filter only allows low frequencies from the frequency domain representation of the image (obtained using DFT) and blocks all high frequencies above the cutoff value. The image is reconstructed using inverse DFT, and since high-frequency components correspond to edges, details, noise, etc., LPF tends to remove these. The next few sections will demonstrate how to implement LPF using numpy
, scipy
and scikit-image
different functions in the library and the impact of LPF on images.
LPF with scipy ndimage and numpy fft
numpy fft
The module's fft2()
functionality can also be used to run FFTs on images. The scipy ndimage
module provides a range of functions for applying LPF to images in the frequency domain. The next section demonstrates one of these filters (ie fourier_gaussian()
.
Fourier-Gaussian filter
This function from the scipy ndimage
module implements a multidimensional Gaussian Fourier filter. The frequency array is multiplied by the Fourier transform of a Gaussian kernel of a given size.
The next code block demonstrates how to blur a grayscale image using LPF ( weighted average filter) :lena
import numpy.fft as fpfig, (axes1, axes2) = pylab.subplots(1, 2, figsize=(20,10))pylab.gray() # show the result in grayscaleim = np.mean(imread('../images/lena.jpg'), axis=2)freq = fp.fft2(im)freq_gaussian = ndimage.fourier_gaussian(freq, sigma=4)im1 = fp.ifft2(freq_gaussian)axes1.imshow(im), axes1.axis('off'), axes2.imshow(im1.real) # the imaginary part is an artifactaxes2.axis('off')pylab.show()
The following is. . .
LPF with scipy fftpack
We can implement LPF on images by following these steps:
scipy.fftpack fft2
Perform a 2D FFT using- Keep only low frequency components (remove high frequency components)
- Perform an inverse FFT to reconstruct the image
The code below shows the Python code that implements LPF. As you can see from the next screenshot, the high frequency components correspond more to the average (flat) image information, and as we remove more and more of the high frequency components, the details of the image (e.g. edges) are lost.
For example, if we keep only the first frequency component and discard all other frequency components, in the resulting image obtained after the inverse FFT, we can barely see the rhino, but as we keep higher and higher frequencies, they appear in Becomes prominent in the final image:
from scipy import fftpack
im = np.array(Image.open('../images/rhino.jpg').convert('L'))
# low pass filter
freq = fp.fft2(im)
(w, h) = freq.shape
half_w, half_h = int(w/2), int(h/2)
freq1 = np.copy(freq)
freq2 = fftpack.fftshift(freq1)
freq2_low = np.copy(freq2)
freq2_low[half_w-10:half_w+11,half_h-10:half_h+11] = 0 # block the lowfrequencies
freq2 -= freq2_low # select only the first 20x20 (low) frequencies, block the high frequencies
im1 = fp.ifft2(fftpack.ifftshift(freq2)).real
print(signaltonoise(im1, axis=None))
# 2.389151856495427
pylab.imshow(im1, cmap='gray'), pylab.axis('off')
pylab.show()
The following screenshot shows the output of the above code, i.e. the total output image obtained by applying LPF on the input rhino image, without finer details:
The code block below shows how to plot the spectrum of an image in the logarithmic domain after blocking high frequencies; in other words, only allowing low frequencies:
pylab.figure(figsize=(10,10))
pylab.imshow( (20*np.log10( 0.1 + freq2)).astype(int))
pylab.show()
The following screenshot shows the output of the previous code, i.e. the spectrum obtained after applying LPF on the image:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Id1GCCM0-1681961321689)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/a8ac3f38-accb-41c2-853e-5644afbe66b7.png)]
The following code block shows the application of LPF on a photographer's grayscale image, with different frequency cutoffs F:
im = np.array(Image.open('../images/cameraman.jpg').convert('L'))
freq = fp.fft2(im)
(w, h) = freq.shape
half_w, half_h = int(w/2), int(h/2)
snrs_lp = []
ubs = list(range(1,25))
pylab.figure(figsize=(12,20))
for u in ubs:
freq1 = np.copy(freq)
freq2 = fftpack.fftshift(freq1)
freq2_low = np.copy(freq2)
freq2_low[half_w-u:half_w+u+1,half_h-u:half_h+u+1] = 0
freq2 -= freq2_low # select only the first 20x20 (low) frequencies
im1 = fp.ifft2(fftpack.ifftshift(freq2)).real
snrs_lp.append(signaltonoise(im1, axis=None))
pylab.subplot(6,4,u), pylab.imshow(im1, cmap='gray'), pylab.axis('off')
pylab.title('F = ' + str(u), size=20)
pylab.subplots_adjust(wspace=0.1, hspace=0)
pylab.show()
The following screenshot shows how LPF detects more and more detail in an image as the cutoff frequency F increases:
Signal-to-noise ratio as a function of cutoff frequency
The following code block shows how to plot the cutoff frequency (F) of the LPF as a function of the signal-to-noise ratio ( SNR) :
snr = signaltonoise(im, axis=None)pylab.plot(ubs, snrs_lp, 'b.-')pylab.plot(range(25), [snr]*25, 'r-')pylab.xlabel('Cutoff Freqeuncy for LPF', size=20)pylab.ylabel('SNR', size=20)pylab.show()
The screenshot below shows how the SNR of the output image decreases as the LPF cutoff frequency increases. The red horizontal line represents the SNR of the original image, drawn for comparison:
With dog filter (BPF)
The Difference of Gaussian ( DoG ) kernel can be used as a BPF, allowing frequencies within a certain band and discarding all other frequencies. The following code block shows how to implement fftconvolve()
BPF using the DoG kernel with:
from skimage import img_as_float
im = img_as_float(pylab.imread('../images/tigers.jpeg'))
pylab.figure(), pylab.imshow(im), pylab.axis('off'), pylab.show()
x = np.linspace(-10, 10, 15)
kernel_1d = np.exp(-0.005*x**2)
kernel_1d /= np.trapz(kernel_1d) # normalize the sum to 1
gauss_kernel1 = kernel_1d[:, np.newaxis] * kernel_1d[np.newaxis, :]
kernel_1d = np.exp(-5*x**2)
kernel_1d /= np.trapz(kernel_1d) # normalize the sum to 1
gauss_kernel2 = kernel_1d[:, np.newaxis] * kernel_1d[np.newaxis, :]
DoGKernel = gauss_kernel1[:, :, np.newaxis] - gauss_kernel2[:, :, np.newaxis]
im = signal.fftconvolve(im, DoGKernel, mode='same')
pylab.figure(), pylab.imshow(np.clip(im, 0, 1)), print(np.max(im)),
pylab.show()
The following screenshot shows the output of the previous code block, i.e. the output image obtained using BPF:
Band stop (notch) filter
This filter blocks/rejects some selected frequencies from the frequency domain representation of the image (obtained using DFT), hence the name. As discussed in the next section, it helps in removing periodic noise from images .
Use a notch filter to remove periodic noise from images
In this example, we will first add some periodic (sinusoidal) noise to the parrot image to create a noisy parrot image (this may be caused by interfering with some electrical signals) , and then observe the frequency of the image using the following code block Effect of noise in the domain:
from scipy import fftpack
pylab.figure(figsize=(15,10))
im = np.mean(imread("../images/parrot.png"), axis=2) / 255
print(im.shape)
pylab.subplot(2,2,1), pylab.imshow(im, cmap='gray'), pylab.axis('off')
pylab.title('Original Image')
F1 = fftpack.fft2((im).astype(float))
F2 = fftpack.fftshift( F1 )
pylab.subplot(2,2,2), pylab.imshow( (20*np.log10( 0.1 + F2)).astype(int), cmap=pylab.cm.gray)
pylab.xticks(np.arange(0, im.shape[1], 25))
pylab.yticks(np.arange(0, im.shape[0], 25))
pylab.title('Original Image Spectrum')
# add periodic noise to the image
for n in range(im.shape[1]):
im[:, n] += np.cos(0.1*np.pi*n)
pylab.subplot(2,2,3), pylab.imshow(im, cmap='gray'), pylab.axis('off')
pylab.title('Image after adding Sinusoidal Noise')
F1 = fftpack.fft2((im).astype(float)) # noisy spectrum
F2 = fftpack.fftshift( F1 )
pylab.subplot(2,2,4), pylab.imshow( (20*np.log10( 0.1 + F2)).astype(int), cmap=pylab.cm.gray)
pylab.xticks(np.arange(0, im.shape[1], 25))
pylab.yticks(np.arange(0, im.shape[0], 25))
pylab.title('Noisy Image Spectrum')
pylab.tight_layout()
pylab.show()
The screenshot below shows the output of the previous code block. It can be seen that in the spectrum near u=175, the periodic noise on the horizontal line becomes more prominent:
Now, let's design a bandstop/bandstop (notch) filter that eliminates the noise-generating frequencies by setting the corresponding frequency components to zero in the next code block:
F2[170:176,:220] = F2[170:176,230:] = 0 # eliminate the frequencies most likely responsible for noise (keep some low frequency components)
im1 = fftpack.ifft2(fftpack.ifftshift( F2 )).real
pylab.axis('off'), pylab.imshow(im1, cmap='gray'), pylab.show()
The screenshot below shows the output of the previous code block, i.e. the image recovered by applying a notch filter. As can be seen, the original image looks sharper than the restored image because some of the real frequencies from the original image are also rejected by the band stop filter along with the noise:
Image restoration
In image restoration, modeling degradation . This can (largely) eliminate the effects of degradation. The challenge is the loss of information and noise. The following figure shows the basic image degradation model:
In the next few sections, we describe two degradation models (i.e., inverse and Wiener filters).
FFT deconvolution and inverse filtering
Given a blurred image with a known (assumed) blur kernel, a typical image processing task is to recover (at least approximately) the original image. This particular task is called deconvolution . One of the simple filters that can be applied in the frequency domain to achieve this is the inverse filter which we will discuss in this section. lena
Let's first blur a grayscale image using Gaussian blur using the following code :
im = 255*rgb2gray(imread('../images/lena.jpg'))
gauss_kernel = np.outer(signal.gaussian(im.shape[0], 3),
signal.gaussian(im.shape[1], 3))
freq = fp.fft2(im)
freq_kernel = fp.fft2(fp.ifftshift(gauss_kernel)) # this is our H
convolved = freq*freq_kernel # by convolution theorem
im_blur = fp.ifft2(convolved).real
im_blur = 255 * im_blur / np.max(im_blur) # normalize
Now we can use the inverse filter (using the same one H
) on the blurred image to restore the original image. The following code block demonstrates how to do this:
epsilon = 10**-6
freq = fp.fft2(im_blur)
freq_kernel = 1 / (epsilon + freq_kernel) # avoid division by zero
convolved = freq*freq_kernel
im_restored = fp.ifft2(convolved).real
im_restored = 255 * im_restored / np.max(im_restored)
print(np.max(im), np.max(im_restored))
pylab.figure(figsize=(10,10))
pylab.gray()
pylab.subplot(221), pylab.imshow(im), pylab.title('Original image'), pylab.axis('off')
pylab.subplot(222), pylab.imshow(im_blur), pylab.title('Blurred image'), pylab.axis('off')
pylab.subplot(223), pylab.imshow(im_restored), pylab.title('Restored image with inverse filter'), pylab.axis('off')
pylab.subplot(224), pylab.imshow(im_restored - im), pylab.title('Diff restored & original image'), pylab.axis('off')
pylab.show()
The screenshot below shows the output. It can be seen that although the inverse filter deblurs the blurred image, there is still some information loss:
The following screenshots show the spectra of the inverse kernel (HPF), original lena
image, Gaussian LPF blurred lena
image, and recovered image in logarithmic scale, respectively. The Python code is left to you as an exercise (3):
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-UeJJLIya-1681961321692)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/6748ffbe-99f2-4dfd-95d2-dfc7b4963bd9.png)]
If the input image is noisy, the inverse filter (HPF) performs poorly because the noise is also enhanced in the output image (see question 4 in the questions section).
Similarly, we can use an inverse filter to deblur an image blurred with a known motion blur kernel. The code remains the same; only the kernel has changed, as shown in the code below. Note that we need to create a zero-padded kernel of size equal to the size of the original image before we can apply the convolution in the frequency domain (using np.pad()
; the details are left as an exercise to you):
kernel_size = 21 # a 21 x 21 motion blurred kernel
mblur_kernel = np.zeros((kernel_size, kernel_size))
mblur_kernel[int((kernel_size-1)/2), :] = np.ones(kernel_size)
mblur_kernel = mblur_kernel / kernel_size
# expand the kernel by padding zeros
The following screenshot shows the spectrum of the previously defined motion blur kernel:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-QeEvXDMY-1681961321692)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/5e5b2849-a619-43ff-b99a-f22d5a8070b4.png)]
The following screenshot shows the output of the inverse filter with a motion blurred image:
Image deconvolution with Wiener filter
In the previous section we have seen how to use an inverse filter to obtain an (approximately) original image from a blurred image (with a known blur kernel). Another important task in image processing is to remove noise from corrupted signals. This is also called image recovery . The following code block shows how to use scikit-image restoration
the module's unsupervised Wiener filter for image denoising and deconvolution:
from skimage import color, data, restorationim = color.rgb2gray(imread('../images/elephant_g.jpg'))from scipy.signal import convolve2d as conv2n = 7psf = np.ones((n, n)) / n**2im1 = conv2(im, psf, 'same')im1 += 0.1 * astro.std() * np.random.standard_normal(im.shape) ...
Image denoising based on FFT
The next example is taken from http://www.scipy-lectures.org/intro/scipy/auto_examples/solutions/plot_fft_image_denoise.html . This example demonstrates how to first denoise an image by binning high-frequency Fourier elements using LPF and FFT. Let's first display a noisy grayscale image using the following code block:
im = pylab.imread('../images/moonlanding.png').astype(float)
pylab.figure(figsize=(10,10))
pylab.imshow(im, pylab.cm.gray), pylab.axis('off'), pylab.title('Original image'), pylab.show()
The following screenshot shows the output of the previous code block, which is the original noisy image:
The following code block displays the spectrum of a noisy image:
from scipy import fftpack
from matplotlib.colors import LogNorm
im_fft = fftpack.fft2(im)
def plot_spectrum(im_fft):
pylab.figure(figsize=(10,10))
pylab.imshow(np.abs(im_fft), norm=LogNorm(vmin=5), cmap=pylab.cm.afmhot), pylab.colorbar()
pylab.figure(), plot_spectrum(fftpack.fftshift(im_fft))
pylab.title('Spectrum with Fourier transform', size=20)
The following screenshot shows the output of the previous code, which is the Fourier spectrum of the original noise image:
Filters in FFT
The following code block shows how to reject a set of high frequencies and implement LPF to attenuate the noise in the image (corresponding to the high frequency components):
# Copy the original spectrum and truncate coefficients.# Define the fraction of coefficients (in each direction) to keep askeep_fraction = 0.1im_fft2 = im_fft.copy()# Set r and c to the number of rows and columns of the array.r, c = im_fft2.shape# Set all rows to zero with indices between r*keep_fraction and r*(1-keep_fraction)im_fft2[int(r*keep_fraction):int(r*(1-keep_fraction))] = 0# Similarly with the columnsim_fft2[:, int(c*keep_fraction):int(c*(1-keep_fraction))] = 0pylab.figure(), plot_spectrum(fftpack.fftshift(im_fft2)),pylab.title('Filtered Spectrum') ...
Reconstruct the final image
The following code block shows how to use IFFT to reconstruct an image from filtered Fourier coefficients:
# Reconstruct the denoised image from the filtered spectrum, keep only the real part for display.
im_new = fp.ifft2(im_fft2).real
pylab.figure(figsize=(10,10)), pylab.imshow(im_new, pylab.cm.gray),
pylab.axis('off')
pylab.title('Reconstructed Image', size=20)
The screenshot below shows the output of the previous code, which is a cleaner output image obtained from the original noisy image through frequency domain filtering:
Summarize
In this chapter, we discuss some important concepts mainly related to 2D convolution and its related applications in image processing, such as spatial filtering. We also discuss several different frequency domain filtering techniques, illustrated with multiple examples of the scikit-image numpy fft
, scipy
, fftpack
, signal
and modules. ndimage
We first introduce the convolution theorem and its application in frequency domain filtering, various frequency domain filters, such as LPF, HPF and notch filters, and finally introduce deconvolution and its application in designing image restoration filters (such as inverse filtering filter and Wiener filter).
After completing this chapter, the reader should be able to write Python code. . .
question
- Use
mpl_toolkits.mplot3d
the module to plot the image in 3D, the Gaussian kernel and the spectrum of the image obtained after convolution in the frequency domain (the output should resemble the surface shown in the sections). (Tip:np.meshgrid()
Functions willsurface
come in handy in drawings). Repeat this exercise for the reverse filter as well. - Add some random noise to
lena
the image, blur the image with a Gaussian kernel, and then try to restore the image using an inverse filter, as shown in the corresponding example. What happened and why? fftconvolve()
Apply a Gaussian blur to a color image in the frequency domain using SciPy Signal's function.- Using the and functions of SciPy's
ndimage
modules , LPFs with cuboidal and ellipsoidal kernels*,* were applied on the image in the frequency domain, respectively.fourier_uniform()
fourier_ellipsoid()
further reading
- https://www.cs.cornell.edu/courses/cs1114/2013sp/sections/S06_convolution.pdf
- http://www.aip.de/groups/soe/local/numres/bookcpdf/c13-3.pdf
- http://www.cse.usf.edu/~r1k/MachineVisionBook/MachineVision.files/MachineVision_Chapter4.pdf
- https://web.stanford.edu/class/ee367/slides/lecture6.pdf
- https://pdfs.semanticscholar.org/presentation/50e8/fb095faf6ed51e03c85a2fcb7eb1ae1b1009.pdf
- http://www.robots.ox.ac.uk/~az/touchts/ia/lect2.pdf*****
4. Image enhancement
In this chapter, we will discuss some of the most basic tools in image processing, such as mean/median filtering and histogram equalization, which are still among the most powerful. The purpose of image enhancement is to improve image quality or make specific features appear more prominent. These techniques are more general and do not assume a strong model of the degradation process (unlike image restoration). Some examples of image enhancement techniques are contrast stretching, smoothing, and sharpening. We will describe the basic concepts and implementation of these techniques using Python library functions PIL
and libraries. We will get acquainted with simple and still popular methods.scikit-image
scipy ndimage
We'll start with point-by-point intensity transformations, then discuss contrast stretching, thresholding, halftoning, and dithering algorithms, and the corresponding Python library functions. We will then discuss different histogram processing techniques such as histogram equalization (its global and adaptive versions) and histogram matching. Then, several image denoising techniques will be described. First, some linear smoothing techniques, such as averaging filters and Gaussian filters, will be described, followed by relatively new nonlinear noise smoothing techniques, such as median filtering, bilateral filtering, and non-local mean filtering, and how to implement them in Python . Finally, different image operations with mathematical morphology and their applications, as well as their implementation, will be described.
The topics covered in this chapter are as follows:
- Pointwise intensity transformation – pixel transformation
- Histogram processing, histogram equalization, histogram matching
- Linear noise smoothing (averaging filter)
- Nonlinear noise smoothing (median filter)
Pointwise intensity transformation – pixel transformation
As described in Chapter 1 , Getting Started with Image Processing , the point-wise intensity transformation operation applies a transfer function T to each pixel f(x,y) of the input image to generate the corresponding pixel in the output image. The transformation can be expressed as g(x,y)=T(f(x,y)) or equivalently as s=T(r) , where r is the grayscale of the pixel in the input image and s is the grayscale of the same pixel in the output image. Convert grayscale. This is a memoryless operation, the output intensity at position ( x , y ) only depends on the input intensity at the same point. Pixels of the same intensity get the same transformation. This brings no new information...
Logarithmic transformation
Logarithmic transformation is very useful when we need to compress or stretch a range of gray levels in an image; for example, to display the Fourier spectrum (where the DC component has much higher values than other components, so without logarithmic transformation it is almost always is that other frequency components cannot be seen). The point transformation function of logarithmic transformation is in the general form , where c is a constant*. *
Let's implement a histogram of the color channels of the input image:
def plot_image(image, title=''):
pylab.title(title, size=20), pylab.imshow(image)
pylab.axis('off') # comment this line if you want axis ticks
def plot_hist(r, g, b, title=''):
r, g, b = img_as_ubyte(r), img_as_ubyte(g), img_as_ubyte(b)
pylab.hist(np.array(r).ravel(), bins=256, range=(0, 256), color='r', alpha=0.5)
pylab.hist(np.array(g).ravel(), bins=256, range=(0, 256), color='g', alpha=0.5)
pylab.hist(np.array(b).ravel(), bins=256, range=(0, 256), color='b', alpha=0.5)
pylab.xlabel('pixel value', size=20), pylab.ylabel('frequency', size=20)
pylab.title(title, size=20)
im = Image.open("../images/parrot.png")
im_r, im_g, im_b = im.split()
pylab.style.use('ggplot')
pylab.figure(figsize=(15,5))
pylab.subplot(121), plot_image(im, 'original image')
pylab.subplot(122), plot_hist(im_r, im_g, im_b,'histogram for RGB channels')
pylab.show()
The following screenshot shows the output of the original image color channel histogram before applying the logarithmic transformation:
Now, let us use point()
the functions of the PIL image module to apply a logarithmic transformation and have an effect on the transformation of the histogram of the different color channels of the RGB image:
im = im.point(lambda i: 255*np.log(1+i/255))
im_r, im_g, im_b = im.split()
pylab.style.use('ggplot')
pylab.figure(figsize=(15,5))
pylab.subplot(121), plot_image(im, 'image after log transform')
pylab.subplot(122), plot_hist(im_r, im_g, im_b, 'histogram of RGB channels log transform')
pylab.show()
The output shows how the histogram is compressed for different color channels:
power law transformation
As we have seen, using the PIL function from Chapter 1 , start point transformation in image processing (the transfer function is of the general form, s = T (r) = cr γ , where c is a constant) on a grayscale image Continuing, this time let's apply a power law transformation to an RGB color image with and then visualize the effect of the transformation on the color channel histogram:point()
scikit-image
im = img_as_float(imread('../images/earthfromsky.jpg'))gamma = 5im1 = im**gammapylab.style.use('ggplot')pylab.figure(figsize=(15,5))pylab.subplot(121), plot_hist(im[...,0], im[...,1], im[...,2], 'histogram for RGB channels (input)')pylab.subplot(122), plot_hist(im1[...,0], im1[...,1], im1[...,2], 'histogram for RGB channels ...
contrast stretch
The contrast stretching operation takes a low-contrast image as input and stretches a narrower range of intensity values to span a desired wider range of values in order to output a high-contrast output image, thus enhancing the contrast of the image. It is simply a linear scaling function applied to the pixel values of the image, so the image enhancement is less drastic (compared to its more complex counterpart, histogram equalization, described later). The following screenshot shows the point transform function of contrast stretching:
As you can see from the previous screenshot, before stretching can be performed, upper and lower pixel value limits (over which the image will be normalized) need to be specified (for example, for grayscale images, the limits are usually set to 0 and 255, so that the output image spans the entire range of available pixel values). We need to find a suitable value m from the CDF of the original image . The Contrast Stretch Transform produces a brighter image than the original image by darkening the levels below the value m (in other words, stretching the values to the lower limit), and brightening the levels before the value m (stretching the values to the upper limit). Image with higher contrast in the original image. The following sections describe how to implement contrast stretching using the PIL library.
Using PIL as a point operation
Let's first load a color RGB image and split it across color channels to visualize the histogram of pixel values for different color channels:
im = Image.open('../images/cheetah.png')im_r, im_g, im_b, _ = im.split()pylab.style.use('ggplot')pylab.figure(figsize=(15,5))pylab.subplot(121)plot_image(im)pylab.subplot(122)plot_hist(im_r, im_g, im_b)pylab.show()
The screenshot below shows the output of the previous code block. As can be seen, the input cheetah image is a low-contrast image because the color channel histogram is concentrated within a specific value range (right-skewed) rather than being distributed over all possible pixel values:
The contrast stretching operation stretches overly concentrated grayscale. . .
Using PIL image enhancement module
ImageEnhance
Modules can also be used for contrast stretching. The following code block shows how to enhance()
enhance the contrast of the same input image using methods from the Contrast object:
contrast = ImageEnhance.Contrast(im)
im1 = np.reshape(np.array(contrast.enhance(2).getdata()).astype(np.uint8), (im.height, im.width, 4))
pylab.style.use('ggplot')
pylab.figure(figsize=(15,5))
pylab.subplot(121), plot_image(im1)
pylab.subplot(122), plot_hist(im1[...,0], im1[...,1], im1[...,2]), pylab.yscale('log',basey=10)
pylab.show()
The output of the code is shown below. It can be seen that the contrast of the input image is enhanced and the color channel histogram is stretched towards the endpoints:
threshold
This is a point operation that creates a binary image from a grayscale image by turning all pixels below a certain threshold into zeros and all pixels above that threshold into ones, as shown in the following screenshot:
If g(x,y) is a threshold version of f(x,y) under some global threshold T , then the following can be applied:
Why do we need a binary image? There are several reasons, for example we might be interested in splitting the image into foreground and background; the image will be printed using a black and white printer (and all...
There is a fixed threshold
The code block below shows how to point()
threshold a fixed threshold using PIL functions:
im = Image.open('../images/swans.jpg').convert('L')
pylab.hist(np.array(im).ravel(), bins=256, range=(0, 256), color='g')
pylab.xlabel('Pixel values'), pylab.ylabel('Frequency'),
pylab.title('Histogram of pixel values')
pylab.show()
pylab.figure(figsize=(12,18))
pylab.gray()
pylab.subplot(221), plot_image(im, 'original image'), pylab.axis('off')
th = [0, 50, 100, 150, 200]
for i in range(2, 5):
im1 = im.point(lambda x: x > th[i])
pylab.subplot(2,2,i), plot_image(im1, 'binary image with threshold=' + str(th[i]))
pylab.show()
The screenshot below shows the output of the previous code. First, we can see the distribution of pixel values in the input image from:
Furthermore, as can be seen below, the binary images obtained using different grayscale thresholds are not colored correctly, resulting in an artifact known as false contours:
*
When discussing image segmentation, we will discuss several different thresholding algorithms in detail in Chapter 6 , Morphological Image Processing .
halftone
One way to reduce false contour artifacts in thresholding (binary quantization) is to add uniformly distributed white noise to the input image before quantization. Specifically, for each input pixel of the grayscale image, f(x,y) , we add an independent uniform [-128128] random number and then threshold it. This technique is called halftone. The following code block shows an implementation:
im = Image.open('../images/swans.jpg').convert('L')im = Image.fromarray(np.clip(im + np.random.randint(-128, 128, (im.height, im.width)), 0, 255).astype(np.uint8))pylab.figure(figsize=(12,18))pylab.subplot(221), plot_image(im, 'original image (with noise)')th = [0, 50, 100, 150, 200]for i in range(2, 5): im1 = im.point(lambda ...
Floyd-Steinberg jitter with error diffusion
Likewise, to prevent large-scale patterns (e.g. false contours), a deliberately applied form of noise is used to randomize the quantization error. This process is called dithering . The Floyd Steinberg algorithm implements dithering using an error diffusion technique, in other words, it pushes (adds) a pixel's remaining quantization error to neighboring pixels, which are processed later. It unfolds the quantization error as a map of neighboring pixels according to the distribution shown in the following screenshot:
In the previous screenshot, the current pixel is represented by a star (*), and blank pixels represent previously scanned pixels. The algorithm scans the image from left to right and top to bottom. Each time the quantization error is distributed between adjacent pixels (that have not yet been scanned), it quantizes the pixel values in turn without affecting pixels that have already been quantized. Therefore, if multiple pixels have been rounded down, it is more likely that subsequent pixels will be algorithmically rounded up so that the average quantization error is close to zero.
The following screenshot shows the algorithm pseudocode:
The following screenshot shows the output binary image obtained using the Python implementation of the preceding pseudocode; it provides a significant improvement in the quality of the obtained binary image compared to the previous halftoning method:
The code is left as an exercise.
Histogram processing – histogram equalization and matching
Histogram processing techniques provide better methods for changing the dynamic range of pixel values in an image so that its intensity histogram has the desired shape. As we can see, image enhancement by contrast stretching operation is limited since it can only apply a linear scaling function.
Histogram processing techniques can be more powerful by using a non-linear (non-monotonic) transfer function to map input pixel intensities to output pixel intensities. In this section, we will scikit-image
demonstrate the implementation of two techniques, namely histogram equalization and histogram matching, using the exposure module of the library. . .
Scikit-based image contrast stretching and histogram equalization
Histogram equalization uses a monotonic and nonlinear mapping that redistributes pixel intensity values in the input image so that the output image has a uniform intensity distribution (flat histogram), thereby enhancing the contrast of the image. The screenshot below depicts the transformation function for histogram equalization:
下面的代码块显示了如何使用曝光模块的equalize_hist()
功能对 scikit 图像*进行直方图均衡化。*直方图均衡化实现有两种不同的风格:一种是对整个图像进行全局操作,另一种是局部(自适应)操作,通过将图像划分为块并在每个块上运行直方图均衡化来完成:
img = rgb2gray(imread('../images/earthfromsky.jpg'))
# histogram equalization
img_eq = exposure.equalize_hist(img)
# adaptive histogram equalization
img_adapteq = exposure.equalize_adapthist(img, clip_limit=0.03)
pylab.gray()
images = [img, img_eq, img_adapteq]
titles = ['original input (earth from sky)', 'after histogram equalization', 'after adaptive histogram equalization']
for i in range(3):
pylab.figure(figsize=(20,10)), plot_image(images[i], titles[i])
pylab.figure(figsize=(15,5))
for i in range(3):
pylab.subplot(1,3,i+1), pylab.hist(images[i].ravel(), color='g'), pylab.title(titles[i], size=15)
pylab.show()
下面的屏幕截图显示了上一个代码块的输出。可以看出,直方图均衡化后,输出图像的直方图变得几乎均匀(以x轴表示像素值,以y轴表示对应的频率),尽管自适应直方图均衡化比全局直方图均衡化更清楚地揭示了图像的细节:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-t2p3OTDq-1681961321697)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw/master/docs/handson-imgproc-py/img/ea138e57-e7c5-4c9e-8625-10d36c704fb2.png)]
以下屏幕截图显示了局部(近似均匀)与自适应(拉伸和分段均匀)直方图均衡化的像素分布变化情况:
以下代码块将使用两种不同的直方图处理技术(即对比度拉伸和直方图均衡化)获得的图像增强与scikit-image
进行比较:
import matplotlib
matplotlib.rcParams['font.size'] = 8
def plot_image_and_hist(image, axes, bins=256):
image = img_as_float(image)
axes_image, axes_hist = axes
axes_cdf = axes_hist.twinx()
axes_image.imshow(image, cmap=pylab.cm.gray)
axes_image.set_axis_off()
axes_hist.hist(image.ravel(), bins=bins, histtype='step', color='black')
axes_hist.set_xlim(0, 1)
axes_hist.set_xlabel('Pixel intensity', size=15)
axes_hist.ticklabel_format(axis='y', style='scientific', scilimits=(0, 0))
axes_hist.set_yticks([])
image_cdf, bins = exposure.cumulative_distribution(image, bins)
axes_cdf.plot(bins, image_cdf, 'r')
axes_cdf.set_yticks([])
return axes_image, axes_hist, axes_cdf
im = io.imread('../images/beans_g.png')
# contrast stretching
im_rescale = exposure.rescale_intensity(im, in_range=(0, 100), out_range=(0, 255))
im_eq = exposure.equalize_hist(im) # histogram equalization
im_adapteq = exposure.equalize_adapthist(im, clip_limit=0.03) # adaptive histogram equalization
fig = pylab.figure(figsize=(15, 7))
axes = np.zeros((2, 4), dtype = np.object)
axes[0, 0] = fig.add_subplot(2, 4, 1)
for i in range(1, 4):
axes[0, i] = fig.add_subplot(2, 4, 1+i, sharex=axes[0,0], sharey=axes[0,0])
for i in range(0, 4):
axes[1, i] = fig.add_subplot(2, 4, 5+i)
axes_image, axes_hist, axes_cdf = plot_image_and_hist(im, axes[:, 0])
axes_image.set_title('Low contrast image', size=20)
y_min, y_max = axes_hist.get_ylim()
axes_hist.set_ylabel('Number of pixels', size=20)
axes_hist.set_yticks(np.linspace(0, y_max, 5))
axes_image, axes_hist, axes_cdf = plot_image_and_hist(im_rescale, axes[:,1])
axes_image.set_title('Contrast stretching', size=20)
axes_image, axes_hist, axes_cdf = plot_image_and_hist(im_eq, axes[:, 2])
axes_image.set_title('Histogram equalization', size=20)
axes_image, axes_hist, axes_cdf = plot_image_and_hist(im_adapteq, axes[:,3])
axes_image.set_title('Adaptive equalization', size=20)
axes_cdf.set_ylabel('Fraction of total intensity', size=20)
axes_cdf.set_yticks(np.linspace(0, 1, 5))
fig.tight_layout()
pylab.show()
下面的屏幕截图显示了前面代码的输出。可以看出,自适应直方图均衡化在使输出图像的细节更清晰方面提供了比直方图均衡化更好的结果:
使用低对比度彩色 cheetah 输入图像,前面的代码生成以下输出:
直方图匹配
直方图匹配是一个过程,其中图像的直方图与另一参考(模板)图像的直方图相匹配。算法如下:
- 将为每个图像计算累积直方图,如以下屏幕截图所示。
- For any given pixel value x** i [to be adjusted] in the input image, we need to find the corresponding pixel value x** j in the output image by matching the histogram of the input image and the histogram of the template image .
- The x * i pixel value has a cumulative histogram value given by G(x** i) . Find a pixel value*xj such that the cumulative distribution value in the reference image, i.e. H(x** j ) is equal to [T28 G (xi . **
Replace input data value *xi with x * j :*
*# Histogram matching of RGB images
For each color channel, matching can be done independently to obtain an output like this:
output image
The Python code to implement this is left as an exercise to the reader (Question 1 of the Questions section).
Linear noise smoothing
Linear (spatial) filtering is a function with a weighted sum of pixel values (in a neighborhood). It is a linear operation on the image and can be used for blurring/noising. Use blurring in a pre-processing step; for example, to remove small (irrelevant) details. Several commonly used linear filters are box filters and Gaussian filters. The filter is implemented with a small (e.g., 3 x 3) kernel (mask) by sliding the mask over the input image and applying the filter function to every possible pixel in the input image to recompute the pixel values (with The center pixel value of the input image corresponding to the mask is replaced by a weighted sum of pixel values, with weights from the mask). . .
PIL finishing
The following sections illustrate how to use ImageFilter
the capabilities of the PIL module for linear noise smoothing; in other words, using linear filters for noise smoothing.
Smoothing using ImageFilter.BLUR
The following shows how to ImageFilter
apply blur to remove noisy images using the filtering function of the PIL module. The noise level on the input image is varied to see its effect on the blur filter. The input image for this example uses the popular mandrill (baboon) image; the image is protected by the Creative Commons License (https://creativecommons.org/licenses/by-sa/2.0/ ) and can be found at https://www. flickr.com/photos/uhuru1701/2249220078 and in the SIPI image database: http://sipi.usc.edu/database/database.php?volume=misc&image=10#Top :
i = 1pylab.figure(figsize=(10,25))for prop_noise in np.linspace(0.05,0.3,3): im = Image.open('../images/mandrill.jpg') # choose 5000 random locations inside ...
Average smoothing using box blur kernel
The following code block shows how to ImageFilter.Kernel()
smooth a noisy image using the PIL function and box blur kernels (averaging filters) of size 3 x 3 and 5 x 5:
im = Image.open('../images/mandrill_spnoise_0.1.jpg')
pylab.figure(figsize=(20,7))
pylab.subplot(1,3,1), pylab.imshow(im), pylab.title('Original Image', size=30), pylab.axis('off')
for n in [3,5]:
box_blur_kernel = np.reshape(np.ones(n*n),(n,n)) / (n*n)
im1 = im.filter(ImageFilter.Kernel((n,n), box_blur_kernel.flatten()))
pylab.subplot(1,3,(2 if n==3 else 3))
plot_image(im1, 'Blurred with kernel size = ' + str(n) + 'x' + str(n))
pylab.suptitle('PIL Mean Filter (Box Blur) with different Kernel size',
size=30)
pylab.show()
The screenshot below shows the output of the previous code. As can be seen, the output image is obtained by convolving a larger-sized box blur kernel with a smoothed noisy image:
Smoothing with Gaussian blur filter
The Gaussian blur filter is also a linear filter, but unlike the simple average filter, it uses a weighted average of the pixels within the kernel window to smooth the pixels (the weight corresponding to adjacent pixels increases exponentially with the distance from adjacent pixels to pixels decline). The following code shows how to use PIL ImageFilter.GaussianBlur()
to smooth a noisy image with different values of the kernel radius parameter:
im = Image.open('../images/mandrill_spnoise_0.2.jpg')pylab.figure(figsize=(20,6))i = 1for radius in range(1, 4): im1 = im.filter(ImageFilter.GaussianBlur(radius)) pylab.subplot(1,3,i), plot_image(im1, 'radius = ' + str(round(radius,2))) i += 1pylab.suptitle('PIL ...
Compare the smoothness of box kernel and Gaussian kernel using SciPy ndimage
We can also ndimage
apply linear filters to smooth images using SciPy's module functions. The following code snippet demonstrates the results of applying a linear filter to a top drill image degraded by impulsive (salt and pepper) noise:
from scipy import misc, ndimage
import matplotlib.pylab as pylab
im = misc.imread('../images/mandrill_spnoise_0.1.jpg')
k = 7 # 7x7 kernel
im_box = ndimage.uniform_filter(im, size=(k,k,1))
s = 2 # sigma value
t = (((k - 1)/2)-0.5)/s # truncate parameter value for a kxk gaussian kernel with sigma s
im_gaussian = ndimage.gaussian_filter(im, sigma=(s,s,0), truncate=t)
fig = pylab.figure(figsize=(30,10))
pylab.subplot(131), plot_image(im, 'original image')
pylab.subplot(132), plot_image(im_box, 'with the box filter')
pylab.subplot(133), plot_image(im_gaussian, 'with the gaussian filter')
pylab.show()
The screenshot below shows the output of the previous code. It can be seen that the box filter of the same kernel size blurs the output image more than the Gaussian filter of the same size with σ=2:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-WTHtuAFL-1681961321699)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/57e43eca-0ad3-4173-bd1c-6af8e612b4a7.png)]
Nonlinear noise smoothing
Nonlinear (spatial) filters also act on neighborhoods, by sliding a kernel (mask) over the image like a linear filter. However, filtering operations are conditionally based on the values of pixels in a neighborhood, and they usually do not explicitly use coefficients in a sum-of-products manner. For example, noise can be effectively reduced using nonlinear filters, whose basic function is to calculate the median gray value in the neighborhood where the filter is located. This filter is a nonlinear filter because the median calculation is a nonlinear operation. Median filters are very popular because they work well with certain types of random noise (e.g., impulse noise). . .
PIL finishing
The PIL ImageFilter
module provides a set of functions for nonlinear denoising of images. In this section we will demonstrate some of them with examples.
Use median filter
A median filter replaces each pixel with the median of neighboring pixel values. This filter is great for removing salt and pepper noise, although it can remove small details from the image. We need to rank the neighborhood strengths and then choose the median. Median filtering is resilient to statistical outliers, has low blurriness, and is easy to implement. The following code block shows how to use the functions ImageFilter
of the PIL module MedianFilter()
to remove salt and pepper noise from a noisy hawthorn image while adding different levels of noise and different sizes of kernel windows for the median filter:
i = 1pylab.figure(figsize=(25,35))for prop_noise in np.linspace(0.05,0.3,3): ...
Use max and min filters
The code below shows how to MaxFilter()
remove salt and MinFilter()
pepper noise from an image using:
im = Image.open('../images/mandrill_spnoise_0.1.jpg')
pylab.subplot(1,3,1)
plot_image(im, 'Original Image with 10% added noise')
im1 = im.filter(ImageFilter.MaxFilter(size=sz))
pylab.subplot(1,3,2), plot_image(im1, 'Output (Max Filter size=' + str(sz) + ')')
im1 = im1.filter(ImageFilter.MinFilter(size=sz))
pylab.subplot(1,3,3), plot_image(im1, 'Output (Min Filter size=' + str(sz) + ')', size=15)
pylab.show()
The screenshot below shows the output of the previous code block. It can be seen that the maximum and minimum filters respectively have certain effects in removing salt and pepper noise in noisy images:
Image smoothing (denoising) using scikit
scikit-image
The library also provides a set of nonlinear filters in the recovery module. In the following sections, we will discuss two very useful filters, namely the bilateral and non-local mean filters.
Use bilateral filter
The bilateral filter is an edge-preserving smoothing filter. For this filter, the center pixel is set to a weighted average of the values of some of its neighboring pixels, only pixels whose brightness is roughly similar to the center pixel. In this section, we will see how to scikit-image
denoise images using the bilateral filter implementation of the package. First, let's create a noisy image from the following grayscale mountain image:
The following code block demonstrates how to use numpy random_noise()
functions:
im = color.rgb2gray(img_as_float(io.imread('../images/mountain.png')))
sigma = 0.155
noisy = random_noise(im, var=sigma**2)
pylab.imshow(noisy)
The screenshot below shows a noisy image created by adding random noise to the original image using the previous code:
The following code block demonstrates how to denoise a previously noisy image using a bilateral filter with different values for the parameters σ color and σ space :
pylab.figure(figsize=(20,15))
i = 1
for sigma_sp in [5, 10, 20]:
for sigma_col in [0.1, 0.25, 5]:
pylab.subplot(3,3,i)
pylab.imshow(denoise_bilateral(noisy, sigma_color=sigma_col,
sigma_spatial=sigma_sp, multichannel=False))
pylab.title(r'$\sigma_r=$' + str(sigma_col) + r', $\sigma_s=$' + str(sigma_sp), size=20)
i += 1
pylab.show()
The screenshot below shows the output of the previous code. It can be seen that if the standard deviation is higher, the image becomes blurrier but less noisy. Execution of the previous code block takes a few minutes because the implementation on RGB images is slower:
Use non-native means
Non-local averaging is a texture-preserving nonlinear denoising algorithm. In this algorithm, for any given pixel, the value of the given pixel is set using only the weighted average of the values of neighboring pixels that have similar local neighbors to the pixel of interest. In other words, small patches centered on other pixels are compared to patches centered on the pixel of interest. In this section, we demonstrate this algorithm by denoising noisy parrot images using a non-local mean filter. The function's h
arguments control the decay of patch weights as a function of the distance between patches. If h
it is larger, it allows for smoother smoothing between different patches. The code block below is shown. . .
Smoothing using scipy ndimage
scipyndimage
模块提供一个名为percentile_filter()
的函数,它是中值滤波器的通用版本。以下代码块演示如何使用此筛选器:
lena = misc.imread('../images/lena.jpg')
# add salt-and-pepper noise to the input image
noise = np.random.random(lena.shape)
lena[noise > 0.9] = 255
lena[noise < 0.1] = 0
plot_image(lena, 'noisy image')
pylab.show()
fig = pylab.figure(figsize=(20,15))
i = 1
for p in range(25, 100, 25):
for k in range(5, 25, 5):
pylab.subplot(3,4,i)
filtered = ndimage.percentile_filter(lena, percentile=p, size=(k,k,1))
plot_image(filtered, str(p) + ' percentile, ' + str(k) + 'x' + str(k) + ' kernel')
i += 1
pylab.show()
下面的屏幕截图显示了前面代码的输出。可以看出,在所有百分位滤波器中,具有较小内核大小的中值滤波器(对应于第 50 个第百分位)能够最好地去除椒盐噪声,同时丢失图像中尽可能少的细节:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gCMjzgIF-1681961331098)(null)]
总结
在本章中,我们讨论了不同的图像增强方法,从点变换(例如,对比度拉伸和阈值)开始,然后是基于直方图处理的技术(例如,直方图均衡化和直方图匹配),然后是基于线性(例如,均值和高斯)的图像去噪技术和非线性(例如,中值、双边和非局部均值)滤波器。
到本章结束时,读者应该能够为点变换(例如,负片、幂律变换和对比度拉伸)、基于直方图的图像增强(例如,直方图均衡化/匹配)和图像去噪(例如,均值/中值滤波器)编写 Python 代码。。。
问题
- 实现彩色 RGB 图像的直方图匹配。
- 使用
skimage.filters.rank
中的equalize()
函数实现局部直方图均衡化,并将其与具有灰度图像的skimage.exposure
中的全局直方图均衡化进行比较。 - 使用此处描述的算法实现 Floyd Steinberg 误差扩散抖动 https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dithering 并将灰度图像转换为二值图像。
- 使用 PIL 中的
ModeFilter()
对图像进行线性平滑。什么时候有用? - 显示一幅图像,该图像可以从几个噪声图像中恢复,这些图像是通过简单地取噪声图像的平均值,将随机高斯噪声添加到原始图像中获得的。中位数也有用吗?
further reading
- http://paulbourke.net/miscellaneous/equalisation/
- https://pdfs.semanticscholar.org/presentation/3fb7/fa0fca1bab83d523d882e98efa0f5769ec64.pdf
- https://www.comp.nus.edu.sg/~cs4243/doc/SciPy%20reference.pdf
- https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dithering
- https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dithering**
5. Image enhancement based on derivatives
In this chapter, we continue our discussion of image enhancement, the problem of improving the appearance or usefulness of an image. We will mainly focus on spatial filtering techniques for computing image gradients/derivatives, and how these techniques can be used for edge detection in images. First, we'll start with the basic concepts of image gradients using first (partial) derivatives, how to calculate discrete derivatives, and then discuss second order derivatives/the Laplacian operator. We will see how to use them to find edges in images. Next, we will discuss several methods of sharpening/de-sharpening images using the Python image processing library PIL, scikit-image
the filter module and the SciPy module. ndimage
Next, we will see how to use different filters ( sobel
, canny
, LoG
etc.) and convolve them with the image to detect edges in the image. Finally, we'll discuss how to compute a Gaussian/Laplacian image pyramid (using scikit-image
) and use the image pyramid to smoothly blend two images. The topics covered in this chapter are as follows:
- Image derivative gradient, Laplacian
- Sharpening and unsharpening masks (with PIL,
scikit-image
SciPyndimage
) - Edge detection using derivatives and filters (Sobel, Canny, LOG, DOG, etc. using PIL,
scikit-image
- Image Pyramid (Gaussian and Laplacian) - Mixed Image (with
scikit-image
Image Derivatives - Gradient and Laplacian
We can calculate the (partial) derivatives of digital images using the finite difference method. In this section, let's discuss how to calculate image derivatives, gradients, and Laplacian functions, and why they are useful. Typically, let's start by importing the required libraries as shown in the following code block:
import numpy as npfrom scipy import signal, misc, ndimagefrom skimage import filters, feature, img_as_floatfrom skimage.io import imreadfrom skimage.color import rgb2grayfrom PIL import Image, ImageFilterimport matplotlib.pylab as pylab
Derivatives and gradients
The figure below shows how to calculate the partial derivative of the image I (which is a function f(x,y) ) using finite differences (with forward and central differences, the latter being more precise) , which can be done using convolution and the kernel shown accomplish. The graph also defines the gradient vector, its magnitude (corresponding to the strength of the edge) and its direction (perpendicular to the edge). Locations in the input image where intensity (grayscale values) change sharply correspond to locations where there are peaks/spikes (or troughs) in the first derivative intensity of the image. In other words, peaks in gradient magnitude mark edge locations, and we need to threshold the gradient magnitude to find edges in the image:
The code block below shows how to calculate gradients (as well as magnitude and direction) using the convolution kernel shown previously, taking a grayscale chess image as input. It also plots how the image pixel values and the x component of the gradient vector vary with the y coordinate of the first row in the image ( x=0
):
def plot_image(image, title):
pylab.imshow(image), pylab.title(title, size=20), pylab.axis('off')
ker_x = [[-1, 1]]
ker_y = [[-1], [1]]
im = rgb2gray(imread('../images/chess.png'))
im_x = signal.convolve2d(im, ker_x, mode='same')
im_y = signal.convolve2d(im, ker_y, mode='same')
im_mag = np.sqrt(im_x**2 + im_y**2)
im_dir = np.arctan(im_y/im_x)
pylab.gray()
pylab.figure(figsize=(30,20))
pylab.subplot(231), plot_image(im, 'original'), pylab.subplot(232), plot_image(im_x, 'grad_x')
pylab.subplot(233), plot_image(im_y, 'grad_y'), pylab.subplot(234), plot_image(im_mag, '||grad||')
pylab.subplot(235), plot_image(im_dir, r'$\theta$'), pylab.subplot(236)
pylab.plot(range(im.shape[1]), im[0,:], 'b-', label=r'$f(x,y)|_{x=0}$', linewidth=5)
pylab.plot(range(im.shape[1]), im_x[0,:], 'r-', label=r'$grad_x (f(x,y))|_{x=0}$')
pylab.title(r'$grad_x (f(x,y))|_{x=0}$', size=30)
pylab.legend(prop={
'size': 20})
pylab.show()
The image below shows the output of the previous code block. As can be seen from the figure below, the partial derivatives in the x and y directions detect vertical and horizontal edges in the image respectively. The gradient size shows the edge strength at different locations in the image. Furthermore, if we pick all pixels from the original image corresponding to one row (e.g., row 0), we can see a square wave (corresponding to alternating white and black intensity patterns), while the gradient magnitude of the same set of pixels is in There are spikes (sudden increases/decreases) in intensity, these correspond to (vertical) edges:
Display magnitude and gradient on the same image
In the previous example, the size and direction of the edges were shown in different images. We can create an RGB image and set the R , G , and B values as follows to show both size and orientation in the same image:
Using the same code as in the previous example, we just replace the lower right sub-lot code with the following code:
im = np.zeros((im.shape[0],im.shape[1],3))im[...,0] = im_mag*np.sin(im_ang)im[...,1] = im_mag*np.cos(im_ang)pylab.title(r'||grad||+$\theta$', size=30), pylab.imshow(im), pylab.axis('off')
Then, using the tiger image, we get the output shown. . .
Laplace's
Rosenfeld and Kak have proved that the simplest isotropic derivative operator is the Laplacian operator, whose definition is shown in the figure below. The Laplacian operator approximates the second derivative of the image and detects edges. It is an isotropic (rotation invariant) operator, with zero crossings marking edge positions; we will talk more about this later in this chapter. In other words, where there are peaks/peaks (or troughs) in the first derivative of the input image, there are zero crossings at the corresponding locations of the second derivative of the input image :
Some notes on the Laplacian operator
Let's take a look at the following annotation:
- is a scalar (unlike gradient, which is a vector)
- Compute the Laplacian function using a single kernel (mask) (unlike gradients, which usually have two kernels, partial derivatives in x and y directions)
- As a scalar, it doesn't have any direction, so we lose the direction information
- is the sum of second-order partial derivatives (the gradient represents a vector consisting of first-order partial derivatives), but higher. . .
The impact of noise on gradient calculations
Derivative filters calculated using the finite difference method are very sensitive to noise. As we saw in the previous chapter, pixels in an image whose intensity values are very different from their neighboring pixels are usually noisy pixels. Generally speaking, the louder the noise and the greater the change in intensity, the stronger the response you will get using the filter. The next code block adds some Gaussian noise to the image to see the effect on the gradient. Let's consider one row of the image again (row 0 to be precise) and let's plot the intensity as {To.T0} x OrthT1 position:
from skimage.util import random_noise
sigma = 1 # sd of noise to be added
im = im + random_noise(im, var=sigma**2)
The image below shows the output of the previous code block after adding some random noise to the chess image. As we can see, adding random noise to the input image has a strong effect on (partial) derivatives and gradient magnitudes; the peaks corresponding to edges are almost indistinguishable from the noise, and the pattern is destroyed:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-LpgBYPcE-1681961321703)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/525c0130-ec59-49fe-a606-aed87d00a8a1.png)]
Smoothing the image before applying the derivative filter should be helpful, as it will remove high frequency components that may be noise and force (noisy) pixels (unlike their neighbors) to look more like their neighbors. So the solution is to first smooth the input image using LPF (like Gaussian filter) and then find the peaks in the smoothed image (using threshold). This gives rise to a logarithmic filter (if we use a second derivative filter), which we will explore later in this chapter.
Sharpen and unsharp mask
The purpose of sharpening is to highlight details in an image or to enhance details that have been blurred. In this section, we'll discuss some techniques and demonstrate several different image sharpening methods with some examples.
Laplacian sharpening
You can use the Laplacian filter to sharpen an image in two steps:
- Apply a Laplacian filter to the original input image.
- Add the output image obtained in step 1 with the original input image (to obtain a sharpened image). The following code block demonstrates how to implement the above algorithm using
scikit-image``filters
the module's functionality:laplace()
from skimage.filters import laplace
im = rgb2gray(imread('../images/me8.jpg'))
im1 = np.clip(laplace(im) + im, 0, 1)
pylab.figure(figsize=(20,30))
pylab.subplot(211), plot_image(im, 'original image')
pylab.subplot(212), plot_image(im1, 'sharpened image')
pylab.tight_layout()
pylab.show()
Here is the output from the previous code block, the original image, and the sharpened image using the previous algorithm:
Unsharp Mask
Unsharp masking is a technique for sharpening an image, subtracting a blurred version of the image from the image itself. The typical blending formula used by an unsharp mask is as follows: Sharpen = Original + (Original − Blur) × Amount .
这里,数量是一个参数。接下来的几节将演示如何使用 Python 中的 SciPy 函数的ndimage
模块实现这一点。
使用 SciPy ndimage 模块
如前所述,我们可以首先模糊图像,然后计算细节图像作为原始图像和模糊图像之间的差值,以实现反锐化掩蔽。锐化后的图像可以作为原始图像和细节图像的线性组合来计算。下图再次说明了该概念:
下面的代码块显示了如何使用 SciPyndimage
模块实现灰度图像的反锐化掩模操作(彩色图像也可以这样做,留给读者练习),使用上述概念:
def rgb2gray(im):
'''
the input image is an RGB image
with pixel values for each channel in [0,1]
'''
return np.clip(0.2989 * im[...,0] + 0.5870 * im[...,1] + 0.1140 * im[...,2], 0, 1)
im = rgb2gray(img_as_float(misc.imread('../images/me4.jpg')))
im_blurred = ndimage.gaussian_filter(im, 5)
im_detail = np.clip(im - im_blurred, 0, 1)
pylab.gray()
fig, axes = pylab.subplots(nrows=2, ncols=3, sharex=True, sharey=True, figsize=(15, 15))
axes = axes.ravel()
axes[0].set_title('Original image', size=15), axes[0].imshow(im)
axes[1].set_title('Blurred image, sigma=5', size=15), axes[1].imshow(im_blurred)
axes[2].set_title('Detail image', size=15), axes[2].imshow(im_detail)
alpha = [1, 5, 10]
for i in range(3):
im_sharp = np.clip(im + alpha[i]*im_detail, 0, 1)
axes[3+i].imshow(im_sharp), axes[3+i].set_title('Sharpened image, alpha=' + str(alpha[i]), size=15)
for ax in axes:
ax.axis('off')
fig.tight_layout()
pylab.show()
下面的屏幕截图显示了前面代码块的输出。可以看出,随着α值的增加,输出变得更尖锐:
使用导数和滤波器进行边缘检测(Sobel、Canny 等)
如前所述,构成图像中边缘的像素是图像强度函数中突然快速变化(不连续)的像素,边缘检测的目标是识别这些变化。因此,边缘检测是一种预处理技术,其中输入为 2D(灰度)图像,输出为一组曲线(称为边缘。在边缘检测过程中提取图像的显著特征;使用边缘的图像表示比使用像素的图像表示更紧凑。边缘检测器输出梯度的大小(作为灰度图像),现在,为了获得边缘像素(作为二值图像)。。。
使用偏导数计算梯度幅值
如前所述,使用偏导数的(正向)有限差分近似计算的梯度幅度(可以认为是边缘强度)可用于边缘检测。下面的屏幕截图显示了通过使用与上次相同的代码来计算梯度大小,然后使用斑马输入的灰度图像以[0,1]间隔剪裁像素值而获得的输出:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-YnBEbj1P-1681961321705)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/53d8dce4-c46f-4c1d-b34c-66f1265bd524.png)]
The screenshot below shows a gradient magnitude image. As shown in the image, the edges appear thicker and multiple pixels wider:
To obtain a binary image with each edge one pixel wide, we need to apply a non-maximum suppression algorithm that removes a pixel if it is not a local maximum in the pixel neighborhood along the gradient direction. The implementation of the algorithm is left as an exercise to the reader. The following screenshot shows the output with non-maximum suppression :
non-maximum suppression algorithm
- The algorithm first checks the angle (direction) of the edge (output by the edge detector).
- If a pixel value is not the maximum on a line tangent to its edge angle, then that pixel value is a candidate to be removed from the edge map.
- This is achieved by splitting the edge direction (360) into eight equal intervals (angles of 22.50 degrees). The table below shows the different situations and the actions to be taken:
- We can do this by looking at the π/8 range and setting up a series of tangential comparisons with if conditions accordingly.
- The effect of edge refinement can be clearly observed (from the previous image). . .
Sobel edge detector based on scikit image
The (first) derivative can be better approximated than using the finite difference method. The Sobel operator shown in the figure below is often used:
The 1/8 term is not included in the standard definition of the Sobel operator because for edge detection purposes it does not make a difference, although the normalization term is required to obtain the gradient values correctly. The next Python code snippet shows how to use the , and functions of the module respectively scikit-image
to find horizontal filters
/ vertical edges and calculate the gradient magnitude using the Sobel operator:sobel_h()
sobel_y()
sobel()
im = rgb2gray(imread('../images/tajmahal.jpg')) # RGB image to gray scale
pylab.gray()
pylab.figure(figsize=(20,18))
pylab.subplot(2,2,1)
plot_image(im, 'original')
pylab.subplot(2,2,2)
edges_x = filters.sobel_h(im)
plot_image(edges_x, 'sobel_x')
pylab.subplot(2,2,3)
edges_y = filters.sobel_v(im)
plot_image(edges_y, 'sobel_y')
pylab.subplot(2,2,4)
edges = filters.sobel(im)
plot_image(edges, 'sobel')
pylab.subplots_adjust(wspace=0.1, hspace=0.1)
pylab.show()
The screenshot below shows the output of the previous code block. It can be seen that the horizontal and vertical edges of the image are detected by horizontal and vertical Sobel filters, while the gradient magnitude image calculated using the Sobel filter detects edges in both directions:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-EPK1ViYR-1681961321707)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/aa566922-cef1-4b02-82e8-c5cdb67c0df2.png)]
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Lab6ACBg-1681961321707)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/66695144-8480-4a32-8ae2-3cadb8107214.png)]
Different edge detectors with scikit images – Prewitt, Roberts, Sobel, Scharr and Laplace
There are many different edge detection operators used in image processing algorithms; they are all discrete (first or second order) differential operators that attempt to approximate the gradient of the image intensity function (e.g., the Sobel operator we discussed earlier) . The kernels shown in the figure below are several common kernels used for edge detection. For example, commonly used derivative filters that approximate the first-order image derivative are Sobel, Prewitt, Sharr, and Roberts filters, while derivative filters that approximate the second-order derivative are Laplacian filters:
As scikit-image
stated in the documentation. . .
Canny edge detector based on scikit image
Canny edge detector is a popular edge detection algorithm developed by John F. Canny. This algorithm has several steps:
-
Smoothing/Noise Reduction : Edge detection operations are sensitive to noise. So, at the beginning, a 5 x 5 Gaussian filter is used to remove noise from the image.
-
Calculate the magnitude and direction of the gradient: Sobel horizontal and vertical filters are then applied to the image to calculate the edge gradient magnitude and direction for each pixel , as described previously. The calculated gradient angle (direction) is then rounded to one of four angles representing the horizontal, vertical, and two diagonal directions for each pixel.
-
Non-maximum suppression : In this step, the edges are thinned – any unwanted pixels that may not constitute an edge are removed. To do this, each pixel is checked to see if it is a local maximum in the direction of the gradient in its neighborhood. As a result, a binary image with thin edges is obtained.
-
Link and hysteresis threshold : This step determines whether all detected edges are strong edges.
min_val
For this purpose a pair of (lagged) threshold sums is usedmax_val
. An edge is determined to be an edge with an intensity gradient value abovemax_val
. Make sure that non-edges aremin_val
edges with intensity gradient values below that, they will be discarded. Edges lying between these two thresholds are classified as edges or non-edges based on their connectivity. If they are connected to "definite edge" pixels, they are considered part of the edge. Otherwise, they are also discarded. This step also removes small pixel noise (assuming edges are long lines).
Finally, the algorithm outputs strong edges in the image. The following code block shows how to scikit-image
implement the Canny edge detector using:
im = rgb2gray(imread('../images/tiger3.jpg'))
im = ndimage.gaussian_filter(im, 4)
im += 0.05 * np.random.random(im.shape)
edges1 = feature.canny(im)
edges2 = feature.canny(im, sigma=3)
fig, (axes1, axes2, axes3) = pylab.subplots(nrows=1, ncols=3, figsize=(30, 12), sharex=True, sharey=True)
axes1.imshow(im, cmap=pylab.cm.gray), axes1.axis('off'), axes1.set_title('noisy image', fontsize=50)
axes2.imshow(edges1, cmap=pylab.cm.gray), axes2.axis('off')
axes2.set_title('Canny filter, $\sigma=1$', fontsize=50)
axes3.imshow(edges2, cmap=pylab.cm.gray), axes3.axis('off')
axes3.set_title('Canny filter, $\sigma=3$', fontsize=50)
fig.tight_layout()
pylab.show()
The screenshot below shows the output of the previous code; for an initial Gaussian LPF, Canny filters with different sigma values are used to detect edges. As shown, the lower the sigma value, the less blurry the original image is, so more edges (finer details) can be found:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-79zKxsxM-1681961321707)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/3af8a217-a041-4e2a-ac30-7d3812b0928b.png)]
Log and Dog Filter
The Laplacian filter of Gaussian ( LoG ) is just another linear filter, which is a combination of a Gaussian filter and a Laplacian filter on an image. Since the 2nd derivative is very sensitive to noise, it is always a good idea to remove the noise by smoothing the image before applying the Laplacian to ensure that the noise is not exacerbated. Due to the associative nature of convolution, it can be thought of as taking the 2nd derivative (Laplacian) of a Gaussian filter and then applying the resulting (combined) filter to the image, hence the name LoG. This can be efficiently approximated using the difference of two Gaussians (DoG) with different scales (variances), as shown in the figure below:
The code block below. . .
Log filter with SciPy ndimage module
The functionality of the SciPy ndimage
module gaussian_laplace()
can also be used to implement logging, as shown in the following code block:
img = rgb2gray(imread('../images/zebras.jpg'))
fig = pylab.figure(figsize=(25,15))
pylab.gray() # show the filtered result in grayscale
for sigma in range(1,10):
pylab.subplot(3,3,sigma)
img_log = ndimage.gaussian_laplace(img, sigma=sigma)
pylab.imshow(np.clip(img_log,0,1)), pylab.axis('off')
pylab.title('LoG with sigma=' + str(sigma), size=20)
pylab.show()
The following images show the input and output images obtained using logarithmic filters with different values of the smoothing parameter σ (standard deviation of the Gaussian filter):
[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-ln95WKK2-1681961321708) (https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/9678f9a3-1be5-40aa-a41a-2e054cea3bdd.png)]
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-bMB0ckFw-1681961321708)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/6d264f80-f8be-4739-8dca-3676c89a1d41.png)]
Edge detection based on logarithmic filter
The following describes LOG
the steps for using filters for edge detection:
-
First, the input image needs to be smoothed (by convolution with a Gaussian filter).
-
Then, the smoothed image needs to be convolved with a Laplacian filter, and the output image is obtained as *∇ 2 (I (x, y) G (x, y)) .
-
Finally, the zero crossings of the image obtained in the last step need to be calculated, as shown in the figure below:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-wrw6voa3-1681961321708)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/53da382b-67a9-4633-abdf-8c8acc326b2e.png)]
Marr and Hildreth edge detection algorithm based on zero-crossing calculation
Marr and Hildreth proposed computing zero-crossings in logarithmically convolved images (detecting edges as binary images). Edge pixels can be identified by looking at the sign of a log-smoothed image defined as a binary image. The algorithm for calculating zero crossings is as follows:
- First, the log convolution image is converted into a binary image, replacing the pixel values to
1
represent positive values and will0
represent negative values - To calculate zero-crossing pixels, we simply look at the boundaries of the non-zero regions in this binary image
- The boundary can be found by looking for any non-zero pixel whose nearest neighbor is zero
- So, for each pixel, if it is non-zero, its eight neighbors are considered; if any neighboring pixel is zero, the pixel can be identified as an edge
Implementation of this functionality is left as an exercise. The following code block describes the edges of the same zebra image detected by zero crossing:
fig = pylab.figure(figsize=(25,15))
pylab.gray() # show the filtered result in grayscale
for sigma in range(2,10, 2):
pylab.subplot(2,2,sigma/2)
result = ndimage.gaussian_laplace(img, sigma=sigma)
pylab.imshow(zero_crossing(result)) # implement the function zero_crossing() using the above algorithm
pylab.axis('off')
pylab.title('LoG with zero-crossing, sigma=' + str(sigma), size=20)
pylab.show()
The screenshot below shows the output of the previous code block, with edges identified only by zero crossings at different σ scales:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-UJM2BJOB-1681961321708)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/ef188972-07db-4c6b-8bc5-151df730a9a3.png)]
The previous image shows zero crossing with LoG/DoG as edge detector. It should be noted that the zero crossing points form a closed contour .
Find and enhance edges with PIL
The functionality of PIL's ImageFilter
modules filter
can also be used to find and enhance edges in images . The following code block shows UMBC library
an example taking an image as input:
from PIL.ImageFilter import (FIND_EDGES, EDGE_ENHANCE, EDGE_ENHANCE_MORE)im = Image.open('../images/umbc_lib.jpg')pylab.figure(figsize=(18,25))pylab.subplot(2,2,1)plot_image(im, 'original (UMBC library)')i = 2for f in (FIND_EDGES, EDGE_ENHANCE, EDGE_ENHANCE_MORE): pylab.subplot(2,2,i) im1 = im.filter(f) plot_image(im1, str(f)) i += 1pylab.show()
The following screenshot shows the output of the above code using different edge finding/enhancement filters:
Image Pyramid (Gaussian and Laplacian) - Mixed Image
We can start from the original image and iteratively create smaller images, first by smoothing (using a Gaussian filter to avoid anti-aliasing , and then by subsampling (collectively called reduction ) to build a Gaussian pyramid of the image from the previous level in each iteration of images until the minimum resolution is reached. Image pyramids created in this way are called Gaussian pyramids . By editing the bands individually (e.g., image blending), these functions are suitable for searching over ranges (e.g., template matching), precomputation and image processing tasks. Similarly, the Laplacian pyramid of an image can be done by starting from the smallest size image in the Gaussian pyramid, then by extending (upsampling plus smoothing) the image at that level and starting with the image at the next level of the Gaussian pyramid. This image is constructed by subtracting it from the image and repeating this process until the original image size is reached. In this section we will see how to write python code to calculate an image pyramid and then look at the application of an image pyramid for blending two images .
Gaussian pyramid with scikit image transform pyramid module
The Gaussian pyramid of the input image can be calculated using scikit-image.transform.pyramid
the module's pyramid_gaussian()
functions. Starting from the original image, this function calls pyramid_reduce()
the function to obtain the smoothed and downsampled image recursively. The following code block demonstrates how to lena
calculate and display such a Gaussian pyramid using an RGB input image:
from skimage.transform import pyramid_gaussianimage = imread('../images/lena.jpg')nrows, ncols = image.shape[:2]pyramid = tuple(pyramid_gaussian(image, downscale=2))pylab.figure(figsize=(20,5))i, n = 1, len(pyramid)for p in pyramid: pylab.subplot(1,n,i), pylab.imshow(p) pylab.title(str(p.shape[0]) ...
Laplacian pyramid with scikit image transform pyramid module
The Laplacian pyramid of the input image can be calculated using scikit-image.transform.pyramid
the module's pyramid_laplacian()
functions. This function starts from the difference image of the original image and its smoothed version, calculates the downsampled image and the smoothed image, and takes the difference of these two images to recursively calculate the image corresponding to each layer. The motivation for creating the Laplacian Pyramid was to achieve compression, since the compression rate is higher for predictable values around 0.
The code used to calculate the Laplacian pyramid is similar to the code used previously to calculate the Gaussian pyramid; this is left as an exercise to the reader. The following screenshot shows lena
the Laplacian pyramid of a grayscale image:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-wh95djqz-1681961321709)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/9705405c-7945-4f35-8f12-b9cb90f58918.png)]
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-eYNYbw7B-1681961321709)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/378a89ab-5e44-45a2-8af0-b49cc8275fb3.png)]
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-Wd0oThH8-1681961321709)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/36cef8b9-2d6b-4479-9991-f668849d575e.png)]
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-zLrseRoL-1681961321709)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/1b524b56-6cc3-494b-b36f-2a755895c1be.png)]
Note that if we use scikit-image
the pyramid_gaussian()
sum pyramid_laplacian()
function, the lowest resolution image in the Laplacian pyramid and the lowest resolution image in the Gaussian pyramid will be different images, which we don't want. We want to build a Laplacian pyramid where the minimum resolution image is exactly the same as the Gaussian pyramid, as this will allow us to build the image from its Laplacian pyramid only. In the next few sections we will discuss algorithms for building our own pyramids scikit-image
using expand()
the sum function.reduce()
Construct Gaussian Pyramid
The Gaussian pyramid can be calculated by following these steps:
- Start with the original image.
- The image is calculated iteratively at each level of the pyramid, first by smoothing the image (using a Gaussian filter) and then downsampling it.
- Stop at a level where the image size is small enough (for example, 1 x 1).
- The function that implements the previous algorithm is left as an exercise to the reader; we only need to add a few lines to the following function to complete the implementation:
from skimage.transform import pyramid_reduce def get_gaussian_pyramid(image): ''' input: an RGB image output: the Gaussian Pyramid of the image as a list ''' gaussian_pyramid = [] # add code here # iteratively ...
Reconstruct image from Laplacian pyramid only
The image below shows how an image can be reconstructed only from its Laplacian pyramid, if we follow the algorithm described in the previous section to construct the image:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-FqgHwFAI-1681961321709)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/fcd408ae-d122-4097-bfda-10af051d17b5.png)]
Please take a look at the following code block:
def reconstruct_image_from_laplacian_pyramid(pyramid):
i = len(pyramid) - 2
prev = pyramid[i+1]
pylab.figure(figsize=(20,20))
j = 1
while i >= 0:
prev = resize(pyramid_expand(prev, upscale=2), pyramid[i].shape)
im = np.clip(pyramid[i] + prev,0,1)
pylab.subplot(3,3,j), pylab.imshow(im)
pylab.title('Level=' + str(j) + ' ' + str(im.shape[0]) + 'x' + str(im.shape[1]), size=20)
prev = im
i -= 1
j += 1
pylab.subplot(3,3,j), pylab.imshow(image)
pylab.title('Original image' + ' ' + str(image.shape[0]) + 'x' + str(image.shape[1]), size=20)
pylab.show()
return im
image = img_as_float(imread('../images/apple.png')[...,:3]) # only use the color channels and discard the alpha
pyramid = get_laplacian_pyramid(get_gaussian_pyramid(image))
im = reconstruct_image_from_laplacian_pyramid(pyramid)
The screenshot below shows the output of the previous code, i.e. how the original image is finally constructed through the Laplacian pyramid, simply using operations on the image at each level and adding it iteratively to the next expand()
level In the image:
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-RoVFQdkM-1681961321710)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/e7fba4b7-b58b-4064-84e8-848b282ea6d7.png)]
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-FbSGT9sa-1681961321710)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/3a67b791-f7f1-45ba-bcff-1ed6be25bb2f.png)]
[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-7lgQe7m4-1681961321710)(https://gitcode.net/apachecn/apachecn-cv-zh/-/raw /master/docs/handson-imgproc-py/img/572a509d-7fc7-4736-ac73-c9be41783105.png)]
Blend image with pyramid
假设我们有两个 RGB 彩色输入图像,a(苹果)和B(橙色),以及第三个二值掩模图像,M;这三幅图像的大小都相同。目标是在遮罩M的引导下,将图像A与B混合(如果遮罩图像 M 中的像素值为 1,则表示该像素取自图像A,否则取自图像B。以下算法可用于使用图像A和B的拉普拉斯金字塔混合两幅图像(通过使用来自A和B的拉普拉斯金字塔相同级别的图像的线性组合计算混合金字塔),使用掩模图像M的高斯金字塔的同一层级的权重),然后从中重建输出图像。。。
总结
In this chapter, we first discussed edge detection of images using several filters (Sobel, Prewitt, Canny, etc.) and by computing the gradient and Laplacian operator of the image. We then discussed LoG/DoG operators and how to implement them and detect edges using zero crossings. Next, we discussed how to compute an image pyramid and use the Laplacian pyramid to smoothly blend two images. Finally, we discussed how to use scikit-image
detection blobs. After completing this chapter, the reader should be able to implement edge detectors (Sobel, Canny, etc.) in images using Python using different filters. Additionally, readers should be able to implement filters to sharpen images and use LoG/DoG to find edges at different scales. Finally, they should be able to blend images with Laplacian/Gaussian pyramids and achieve blob detection in images in different scale spaces. In the next chapter, we will discuss feature detection and extraction techniques for images.
question
-
Use
skimage.filters
the module'sunsharp_mask()
functions andradius
differentamount
values of its parameters to sharpen the image. -
Use the functions
ImageFilter
of the PIL moduleUnsharpMask()
and different values of theradius
andpercent
parameters to sharpen the image. -
Sharpen color (RGB) images using sharpening kernels [[0,-1,0],-1,5,-1],[0,-1,0]]. (Tip: Use the functions
signal
of the SciPy module one by one for each color channelconvolve2d()
.) -
Using the SciPy
ndimage
module, it is possible to sharpen color images directly (without sharpening individual color channels one by one). -
Use
skimage.transform
the module'spyramid_laplacian()
functions to compute and displaylena
a Gaussian pyramid with a grayscale input image. -
architecture
further reading
-
https://web.stanford.edu/class/cs448f/lectures/5.2/Gradient%20Domain.pdf
-
https://www.cs.cornell.edu/courses/cs6670/2011sp/lectures/lec02_filter.pdf
-
http://www.cs.toronto.edu/~mangas/teaching/320/slides/CSC320L05.pdf
-
http://graphics.cs.cmu.edu/courses/15-463/2005_fall/www/Lectures/Pyramids.pdf
-
http://www.me.umn.edu/courses/me5286/vision/VisionNotes/2017/ME5286-Lecture7-2017-EdgeDetection2.pdf
-
https://www.cs.rutgers.edu/~elgammal/classes/cs334/EdgesandContours.pdf
-
http://www.cs.cornell.edu/courses/cs664/2008sp/handouts/edges.pdf
-
http://www.cs.cmu.edu/~16385/s17/Slides/4.0_Image_Gradients_ 和 _Gradients_Filtering.pdf
-
http://www.hms.harvard.edu/bss/neuro/bornlab/qmbc/beta/day4/marr-hildreth-edge-prsl1980.pdf
-
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.420.3300 &rep=rep1&type=pdf
-
https://www.cs.toronto.edu/~mangas/teaching/320/assignments/a3/tcomm83.pdf
-
https://www.cs.toronto.edu/~mangas/teaching/320/assignments/a3/spline83.pdf
-
https://docs.opencv.org/3.1.0/dc/dff/tutorial_py_pyramids.html