Basics of FPGA Digital Image Processing

01 Basic concepts of digital images

Digital image is the basis of computer vision and image processing, which is different from analog image. Usually, the directly observed images can be understood as continuous analog quantities, which involve relatively complex calculations and high internal correlations, making it difficult to form a unified quantitative standard. With the development of computers, in order to facilitate the calculation and quantitative processing of computers, like most analog quantities, analog images need to be converted into discrete digital quantities through sampling and quantization, that is, digital images.
insert image description here

1.1 Digital image extraction
Digital images are obtained by sampling and quantifying analog images. This process is usually implemented by an image sensor (such as a CMOS image sensor), which is usually an array of photosensitive elements. The performance of the image sensor determines the quality of the collected digital images.
insert image description here
1.1.1 Sampling processing

Sampling process is to digitize the spatial coordinates of the image, and convert the spatially continuous image into discrete sampling points. The evaluation index of the sampling process is the spatial resolution of the image.
insert image description here
1.1.2 Quantization processing

Quantifying (Quantifying) processing is to assign values ​​to the sampled signal according to certain rules, and to discretize the amplitude of the sampled points. The evaluation index of the quantization process is the magnitude resolution of the image, also called the gray scale resolution. Quantification is usually based on the light and dark information of the image, considering the recognition ability of the human eye, and using 8bit 0 255 to describe black and white .

Quantization can usually be divided into uniform quantization and non-uniform quantization:
uniform quantization layers continuous gray values ​​at equal intervals, and the more layers, the smaller the quantization error generated;

Non-uniform quantization usually sets different sampling intervals based on different characteristics. For example, non-uniform quantization based on visual features usually reduces the sampling interval in places with rich details, and non-uniform quantization based on statistical features usually reduces the sampling interval where gray values ​​occur frequently.
insert image description here
The digital image extracted after sampling and quantization processing can be represented by a matrix, the size of the matrix is ​​the spatial resolution of the image, and the value of each element of the matrix is ​​the amplitude resolution of the corresponding pixel.
insert image description here
1.2 Concept of digital image

Common concepts in digital image processing:

1.2.1 The basic unit of digital image - pixel

Pixel, also known as pixel, is the basic element of digital image. Pixels are generated during the discretization process of the analog image, that is, each sampling point extracted by the sampling process is a pixel. Each pixel has position coordinates and gray value (or color value). All other things being equal, an image that contains more pixels has higher resolution.
insert image description here
1.2.2 Gray scale and depth of digital image

The grayscale and depth of digital images are usually used to measure the degree of quantization. Grayscale refers to the size of the value of each pixel, and the grayscale range refers to the range of the value of each pixel. Generally, under the same conditions, the image with a larger grayscale value range is clearer; depth refers to the storage required for each pixel. The capacity, also known as the gray level, the larger the gray range, the greater the depth required. Usually 8bit is used to represent 256 gray levels, the gray level is an integer from 0 to 255, and the depth is 8.
insert image description here
1.2.3 Resolution of digital images

The resolution of digital images includes spatial resolution and amplitude (gray scale) resolution. Spatial resolution refers to the number of pixels contained in a digital image, which is used to measure the size of the image (usually the resolution is used to describe the size of the image and cannot be separated from the spatial unit, for example, the printing industry uses dpi to represent the number of pixels per inch); amplitude (grayscale) resolution Refers to the number of digits to quantize the grayscale of a digital image, which has the same meaning as depth.
insert image description here
1.2.4 Common types of digital images

Binary image: the image pixel brightness value is only composed of 0 and 1;

Grayscale image: the image pixel brightness value is represented by 0 255 in black and white ;

Color image: It consists of three RGB images of different colors;

Stereo image: a pair of images taken from different angles of an object, and the image depth information can be calculated;

Three-dimensional image: composed of a set of stacked two-dimensional images, each image represents a cross-section of the object;

1.2.5 Common formats of digital images

(1) JPG format: the full name is JPEG, which stores a single raster image in 24 colors, supports the highest level of lossy compression, and the compression ratio can reach 100:1, but it will sacrifice a lot of image quality, usually between 10:1 and 20: 1 compression ratio to ensure image quality. JPEG compression works well for similar tones, but not well for large differences in brightness and areas of solid color. JPEG supports interlaced progressive display, but does not support transparency and animation. Editing operations on JPEG images other than rotation and cropping usually result in loss of image quality, so PNG is usually used as a transition format during the editing process;

(2) PNG format: Portable Network Graphic Format (PNG) is an image file storage format developed in the mid-1990s. Its purpose is to replace GIF and TIFF file formats, while adding some GIF file formats. Features that PNG does not have adopts a lossless data compression algorithm derived from LZ77. When used to store grayscale images, the depth of grayscale images can be as much as 16 bits. When storing color images, the depth of color images can be as many as 48 bits. And it can also store up to 16-bit alpha channel data. The PNG format includes many categories. In practice, it can be roughly divided into 256-color and full-color. The 256-color PNG can replace the GIF format, and the full-color PNG can replace the JPEG format. PNG supports alpha transparency, that is, it supports transparency, opaqueness and translucency. Images with few colors and mainly filled with solid colors or smooth gradient colors and with large brightness differences are suitable for storage in PNG8 format;

(3) GIF format: GIF (GraphicsInterchange Format) image interchange format is an image file format developed by CompuServe in 1987. The data of the GIF file adopts a variable length compression algorithm, which is a continuous tone lossless compression format based on the LZW algorithm, and the compression rate is generally about 50%. It does not belong to any application and is currently supported by almost all related software. The image depth of GIF ranges from 1bit to 8bit, and supports up to 256 color images. Multiple color images can be stored in a GIF file. If the multiple image data stored in one file are read out one by one and displayed on the screen, the simplest animation can be formed. GIF supports animation and Boolean transparency, that is, it does not support translucency (alpha transparency). GIF supports optional progressive display at intervals. At the same time, LZW horizontal scan compression makes the horizontal GIF of the same picture take up less space than the vertical GIF;

(4) BMP format: BMP is an image file format that has nothing to do with hardware devices. It adopts a bit-mapped storage format and does not use any other compression except for the optional image depth of 1bit, 4bit, 8bit and 24bit. Therefore, BMP files It takes up a lot of space. BMP supports indexed color and direct color. When storing data, the image is scanned from left to right and from bottom to top;

(5) SVG format: Scalable Vector Graphics (Scalable Vector Graphics) is an open standard based on Extensible Markup Language (a subset of Standard General Markup Language) developed by the World Wide Web Consortium, which is used to describe a two-dimensional vector graphics. Graphics format so there is no distortion when zoomed in and lines don't turn into pixels. SVG uses the XML format to define graphics, and is integrated with W3C standards such as DOM and XSL;

02 Digital image processing algorithm

Digital image processing algorithms usually include image enhancement, image stitching, image segmentation, image compression, image recognition, image transformation, image restoration, image reconstruction and other research fields.
2.1 Image Enhancement

The image enhancement technology improves the image quality and image clarity for images with high noise such as insufficient lighting and the influence of rain and fog. Commonly used algorithms include histogram equalization, Retinex theory, deep learning, etc.

2.1.1 Histogram equalization

There is a direct mapping relationship between the image histogram and the image pixels, and histogram equalization can express this mapping. The basic idea of ​​histogram equalization is to widen the gray level with a large number of pixel statistics in the image, reduce the gray level with a small number of pixel statistics, and change the histogram distribution of the image channel to achieve the effect of visual contrast enhancement. .

Histogram equalization image enhancement enhances the contrast of the image by using the characteristic that the histogram can be expressed by layer difference, and amplifies the gray level difference between adjacent pixels, thereby improving the image contrast.
insert image description here
Histogram equalization only maps the image pixels, calculates the pixel intensity and ignores its spatial information, resulting in insufficient local brightness adjustment ability of the image, which affects the overall brightness of some scenes, and may introduce other noises.

2.1.2 Retinex theory
The basic idea of ​​Retinex is to decompose the image S to be enhanced into the product of the reflection component R(x) of the image and the light component L(x), that is, the color of the object itself has nothing to do with the light component, but depends on the surface of the object Reflective properties, thus removing components that interfere with image intensity from the input image can result in image enhancement.

The inherent prior is introduced into the decomposition model to enhance the contrast, and constraints are introduced in the reflection layer and the illumination layer (the reflection layer regulates adjacent pixels through color similarity, and the illumination layer is constrained by segmental smoothing) to solve the enhancement The image appears to be too bright, etc.
insert image description here
2.1.3 Deep Learning

With the development of deep learning, image enhancement algorithms based on convolutional neural network (CNN) and generative adversarial network (GAN) have also been proposed one after another. The figure below is a flowchart of the CNN network method inspired by the Retinex principle, which divides the network into two parts for adjusting light and eliminating noise.
insert image description here
Unsupervised deep network model, zero-reference depth curve estimation network (Zero-DCE) method utilizes deep network to enhance as the task of image-specific curve estimation, this method does not require paired or unpaired data, but trains a Networks for estimating pixel-level and high-order curves to fit a given image dynamic range.
insert image description here
2.2 Image Stitching

Image stitching technology is aimed at the limitations of the display range of images captured by a single lens, and stitches multiple images into one to overcome the limitation of the image field of view, expand the image range, and better display details. There are two main methods of image stitching: region-based image stitching methods and feature-based image stitching methods.

2.2.1 Region-based image stitching method

The region-based image stitching method calculates the intensity difference of two images to be stitched in the same region to obtain the stitched image. Region-based stitching methods include: pixel point matching method, method based on mutual information and method based on Laplacian pyramid. Such methods have the advantage of optimally utilizing image alignment information, but are computationally intensive, have low registration accuracy, and are not invariant to common geometric transformations.

2.2.2 Feature-based image stitching method

The feature-based image stitching method is to obtain image feature information through pixels to match and stitch images. The steps include: image feature extraction and description, feature matching, reprojection stitching, and image fusion. Image feature extraction and description is to describe an area that is highly distinguishable from other areas, and local features should be accurate, effective, and highly distinguishable. Feature matching is to find the matching points between the images to be stitched in the same scene. Reprojection stitching and image fusion refers to the use of geometric transformations to align images into a unified coordinate system and eliminate seams caused by lighting effects. In this step, the geometric transformation relationship of the input image is first described. The general transformation relationship is a homography matrix with 8 degrees of freedom, then reprojection stitching is performed, and finally the pixels with sudden changes in light at the stitching seam are fused.

Feature-based image stitching technology has good universality, the highest frequency of use in various fields, and the widest range, so it is the best choice for image stitching. Common algorithms for image registration include Harris corner detection, SIFT, PCA-SIFT, FAST, ORB, SURF and other algorithms; common algorithms for image reprojection stitching and fusion are RANSAC algorithms.

2.3 Image Segmentation

Image segmentation technology studies certain specific objects in the image, and separates the objects to be studied to study the changing characteristics of the objects more clearly. Image segmentation technology is mainly used in the fields of medicine and scientific research. The complex characteristics of different human organs, tissues and diseased areas in medical images, as well as the complexity and multimodality of medical images themselves, are suitable for various types of medical Image segmentation algorithms for image segmentation tasks place high demands on them. Image segmentation technology can be divided into two categories according to manual extraction of image features and automatic learning methods: traditional image segmentation methods and image segmentation methods based on deep learning.
insert image description here
2.3.1 Traditional Image Segmentation Methods

Traditional medical image segmentation methods mainly rely on the discontinuity and similarity features of intensity values ​​in the image. For example, the method based on edge detection uses its discontinuity to realize image segmentation according to the instantaneous change of intensity level or image gray level in the image, which mainly focuses on the identification of isolated points. Threshold segmentation or region segmentation etc. segment images based on pixel similarity within a certain range according to preset criteria for image segmentation. Traditional medical image segmentation methods can be roughly divided into threshold segmentation methods, region-based segmentation methods, cluster segmentation methods, edge detection-based segmentation methods, and model-based segmentation methods.

2.3.2 Image segmentation method based on deep learning

The segmentation method based on deep learning overcomes the limitations of traditional manual feature segmentation and highlights its advantages in automatic feature learning. Among them, the fully convolutional network (Fully Convolutional Network, FCN) has achieved excellent results in semantic segmentation, providing new ideas and directions for semantic segmentation, and has been widely used in the task of semantic segmentation of medical images, and has made great achievements in this field. A large number of research results. From the perspective of data-driven technology, the methods of deep learning in medical image segmentation can be roughly divided into supervised learning, weakly supervised learning and other methods.

2.4 Image Compression

Image compression technology aims at the problem of large image data redundancy, and compresses the redundancy in the image to reduce the amount of image data, thereby speeding up image transmission and processing. The research direction of image compression technology mainly includes two aspects: traditional image compression technology research and image compression technology research based on deep learning.

2.4.1 Research on Traditional Image Compression Technology

From the perspective of information recovery, traditional image compression technology research is mainly divided into two categories: lossless compression and lossy compression:

Lossless compression coding: no information is lost during the compression process, the reconstructed image is exactly the same as the original image, and redundancy is minimized during the data compression process. Commonly used coding includes Huffman coding, arithmetic coding, run-length coding, etc.;

Lossy compression coding: The compression process achieves a higher compression rate by losing certain information, and the reconstructed image has a gap with the original image. Commonly used coding includes JPEG, H.264, etc.
insert image description here
2.4.2 Research on Image Compression Technology Based on Deep Learning

Traditional coding quality assessment usually targets some objective performance indicators, and it is difficult to meet the requirements for subjective quality indicators and semantic quality indicators, and it is also impossible to obtain deep semantic information of images. Therefore, traditional image compression coding can no longer meet modern requirements in this respect. The development and progress of deep learning and computer vision have brought a new way to solve the problem of image compression coding: end-to-end image compression can Each module is jointly optimized, relying on the data itself for evaluation.

2.5 Image recognition

Image recognition technology has made in-depth research and development in the direction of recognizing objects through computer vision, and has achieved excellent results under the development of neural network and other technologies. Image recognition technology can be divided into two categories: traditional image recognition algorithms and image recognition algorithms based on deep learning.

2.5.1 Traditional Image Recognition Algorithms

Traditional image recognition algorithms include differential operator edge detection algorithm, Canny edge detection algorithm, corner detection algorithm and so on.

Canny edge detection algorithm: generally includes 4 steps: filtering, gradient magnitude and gradient direction calculation, non-maximum suppression calculation, edge detection and connection. First, the Gaussian filter function is used to remove the noise of the image, and the image is smoothed, and then the first-order finite difference method is used to solve the partial derivative of the horizontal and vertical pixels of the filtered image, and then the non-maximum value suppression algorithm is used. Set the positive and negative gradient values ​​outside the local maximum to 0, and finally process the pixels in the candidate edge image through two different thresholds, keep the pixels within the range of the two thresholds, and finally detect the object. The traditional Canny edge detection algorithm has poor noise reduction ability. At the same time, four anisotropic 5th-order difference templates are used to detect pixels in multiple directions. Ability to detect values ​​in the diagonal direction. In order to improve the adaptive ability of the Canny algorithm, adaptive median filtering and morphological closing operations are used to prevent the edge information from being weakened when calculating the multi-directional gradient magnitude, and at the same time, the best separation point between the target and the background is the largest under the optimal gradient. The concept of between-class variance and the smallest intra-class variance is used to calculate the upper and lower thresholds in the Canny algorithm, so as to improve its adaptive ability.

Corner detection algorithm: Sliding in any direction in the image through a fixed pixel window, comparing the gray value of the pixel in the window before and after sliding, if there is a large change, it can be judged that there is a corner in the pixel. Corner detection algorithms are divided into three categories: corner detection based on binary images, corner detection based on grayscale images, and corner detection based on edge contours. The traditional Harris corner detection algorithm has low precision and poor noise immunity. Combining the Sobel algorithm with the Harris algorithm can effectively improve the performance of the algorithm (first use the Sobel algorithm for primary corner selection, and use the rectangular template in the non-maximum suppression algorithm with The circle template is replaced to improve the detection accuracy, and finally the neighbor point elimination method is used to improve the noise immunity of the algorithm). After comparing the intensity change characteristics of step edge, L-shaped corner, Y or T-shaped corner, X-shaped corner and star-shaped corner, a new feature of grayscale change is extracted from the input image by using a multi-scale anisotropic Gaussian directional derivative filter. The method can continuously extract the edge and corner features in the image.

2.5.2 Image recognition algorithm based on neural network learning

Image recognition algorithms based on neural network learning can be mainly divided into convolutional neural networks, attentional neural networks, autoencoder neural networks, generative networks, and spatiotemporal networks. Convolutional neural network is the basis of all other complex networks. Based on CNN convolutional neural network, a series of typical frameworks are proposed: Le Net, Alex Net, Goole Net, VGGNet, Res Net, Inception, DenseNet.

03 FPGA and digital image processing

A digital image is composed of a large number of pixels. Digital image processing algorithms are usually based on uncompressed original images. Taking a 1920 1080 RGB format 256 grayscale image as an example, the data volume of a single image is 1920 1080 3 8bit=6.22MB. DSP, GPU, and CPU usually process images in units of frames. They need to read the captured images from the memory and process them. Processing a frame of images will consume a lot of time, making it difficult to increase the frame rate of video images.

The advantage of FPGA digital image processing lies in the real-time pipeline operation in line units, so as to achieve the highest real-time performance. The FPGA can be directly connected to the image sensor chip to obtain the image data stream, and the RAW format can also be interpolated to obtain the RGB image data. FPGA implements real-time pipeline processing by caching several lines of image data in the internal Block RAM. The Block RAM of the FPGA is similar to the Cache of the CPU, but the Cache cannot be completely controlled, and the Block RAM is completely controllable, so it can be used to implement various flexible calculations.

Due to the limitation of Block RAM in FPGA, it is usually possible to use an external DDR to cache the image and then read it out. However, this processing mode is similar to that of the CPU reading from the memory, and cannot achieve the highest real-time performance. Therefore, how to make reasonable use of the scarce resource Block RAM is the key to give full play to the real-time performance of FPGA image processing.

FPGA usually uses sequential reading for image data flow processing, which can quickly and efficiently implement algorithms such as filtering, erosion, expansion, and edge extraction with 3x3 to NxN operators, as well as convolutional layer operations of convolutional neural networks. FPGA's parallel features It is determined that processing this type of algorithm is the fastest and has the greatest advantage compared to other processors. Although the operation of this algorithm is relatively simple, different functions such as edge extraction and moving target recognition can be realized through the cooperation of different operators, and it can meet the real-time requirements at the same time.

Therefore, the parallel nature of FPGA gives it certain advantages in the fields of image processing and deep learning. Since FPGA development requires a certain threshold compared with embedded DSPs, the number of engineers in this field is relatively small, and the functions that FPGA is good at need to be developed. Dig further.

Guess you like

Origin blog.csdn.net/weixin_45104510/article/details/129969096