Summary of python's opencv usage

As one of the easiest languages ​​to use, python has a large number of third-party libraries. The existence of these third-party libraries allows many people to focus on business logic and mathematical logic while ignoring tedious code operations. Python's opencv third-party library is one of them one.

1. Installation and simple use of third-party libraries

Install

A simple pip installation is enough. The use of the opencv library often involves some matrix operations, so numpy can be regarded as a family with it.

pip install opencv-python

After installation, you can simply open the picture, open the video, let's do a simple experiment:

read pictures

import cv2

# 读取图像,第一种是正常读取,第二种是读取灰度图像
img = cv2.imread(r"D:\img\among.png")
gray = cv2.imread(r"D:\img\among.png", 0)
# 显示图像
cv2.imshow("colorful", img)
cv2.imshow("gray", gray)
# 不再等待键盘输入事件,直接显示
cv2.waitKey(0)
# 关闭所有显示窗口
cv2.destroyAllWindows()

The display effect is as follows:
insert image description here
read the video and play it

import cv2

# 读取视频
video = cv2.VideoCapture('badapple_color.mp4')
# 获取视频对象的帧数
fps = video.get(cv2.CAP_PROP_FPS)
# 设定循环条件
while(video.isOpened()):
    _, frame = video.read()
    cv2.imshow("video", frame)
    # 设置退出条件是输入'q'
    if cv2.waitKey(int(fps)) in [ord('q'), 27]:
            break
cv2.destroyAllWindows()
video.release()

Note: The playback here does not have the progress bar and audio like a normal player, because here is every frame of the read video, and then it is played in a loop, and the audio is not read, and the exit condition is set to enter q.
insert image description here
Get camera recording and save video

import cv2

video = cv2.VideoCapture(0)
while(True):
    # 获取一帧
    _, frame = video.read()
    # 将这帧转换为灰度图
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    cv2.imshow('frame', gray)
    if cv2.waitKey(1) == ord('q'):
        break

video.release()

Then it will continuously output every frame that the camera receives, because what is read is grayscale, here is also grayscale, you can let it output normally without modification, that is, comment out the conversion statement of cvtColor , by the way, change the imshow output object back to the original frame (the read frame), and you can get the color map.
insert image description here
insert image description here
Make a small change to make it save the video:

import cv2

video = cv2.VideoCapture(0)

# 定义编码方式并创建VideoWriter对象
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
outfile = cv2.VideoWriter('res.mp4', fourcc, 25., (640, 480))

while(video.isOpened()):
    flag, frame = video.read()

    if flag:
        outfile.write(frame)  # 写入文件
        cv2.imshow('frame', frame)
        if cv2.waitKey(1) == ord('q'):
            break
    else:
        break

video.release()

as follows:
insert image description here

2. Image foundation

The image in the computer is composed of small colored squares. These small squares are the basic processing units called pixels. Its size depends on the resolution of the computer, the higher the resolution, the smaller the pixels. A simple binary image, whose pixel values ​​are only 0 and 1, is used to identify two colors of black and white; the further grayscale image here is to refine the black and white to make the image much more vivid, that is, to represent The value of black and white is stored from 0-255, 0 is pure black, 1 is pure white, and the black and white gradient in the middle is more delicate; further, it is a color image, and the color composition is basically composed of three primary colors in different proportions. come out, and the color image has three values ​​​​to represent the values ​​of the three primary colors, that is to say, the pixel of a color image, its color is composed of three values, this mode is also called RGB color space. For example, [0, 0, 0] represents pure black, [255, 255, 255] represents pure white, and [255, 0, 0], [0, 255, 0] and [0, 0, 255] represent red respectively The three primary colors of green and blue, so the RGB color space is also called three channels. In opencv, the order of the three channels is the reverse order of BGR. A color image is made up of a matrix composed of three channels, which is why Numpy is often introduced for cooperation when processing color images. Image processing is also a matter of mathematics.


Simple example:

import cv2
import numpy as np

# 黑白图
b = np.zeros((100, 100), dtype=np.uint8)
w = np.zeors((100, 100), dtype=np.uint8)
w[:100, :100] = 255
print(b, w, sep="\n\n")

cv2.imshow("black", b)
cv2.imgshow("white", w)
cv2.waitKey()

# 三个三原色图片
r = np.zeros((300, 300, 3),dtype=np.uint8)
g = np.zeros((300, 300, 3),dtype=np.uint8)
b = np.zeros((300, 300, 3),dtype=np.uint8)
r[:,:, 2] = 255
g[:,:, 1] = 255
b[:,:, 0] = 255

cv2.imshow("red", r)
cv2.imshow("green", g)
cv2.imshow("blue", b)
cv2.waitKey()

# 包含三原色的图片
img = np.zeros((300, 300, 3), dtype=np.uint8)
img[:, 0:100, 2] = 255
img[:, 100:200, 1] = 255
img[:, 200:300, 0] = 255
cv2.imshow("RGB", img)

# 红橙黄绿蓝靛紫
img = np.zeros((300, 700, 3), dtype=np.uint8)
# 红
img[:,0:100,2] = 255
# 橙
img[:,100:200,2] = 255
img[:,100:200,1] = 97
# 黄
img[:,200:300,1] = 255
img[:,200:300,2] = 255
# 绿
img[:,300:400,1] = 255
# 蓝
img[:,400:500,0] = 255
# 靛
img[:,500:600,0] = 84
img[:,500:600,1] = 46
img[:,500:600,2] = 8
# 紫
img[:,600:700,0] = 240
img[:,600:700,1] = 32
img[:,600:700,2] = 160
# 输出
cv2.imshow("seven", img)
cv2.waitKey()

The output doesn't show up, that's it. In fact, if you want to create a color comparison table, you can also iterate one channel by one channel, and then iterate the entire sequence and output it, so that there is a color table, but you still need to refer to the real name if you want to prepare for the value.

random graph

The whole random grayscale image is the state of no TV signal in the past.

import cv2
import numpy as np

img = np.random.randint(0, 256, size=[300, 300], dtype=np.uint8)
cv2.imshow("老花", img)
cv2.waitKey()

img = np.random.randint(0, 256, size=[300, 300], dtype=np.uint8)
cv2.imshow("彩色老花", img)
cv2.waitKey()

insert image description here| insert image description here
It has been described above that the RGB color map has three channels, and opencv provides the implementation of channel splitting

import cv2
import numpy as np

img = cv2.imread(r"D:\img\among.png")
# b, g, r = cv2.split(img) 等同
cv2.imshow("0", img[:,:,0])
cv2.imshow("1", img[:,:,1])
cv2.imshow("2", img[:,:,2])

# 同样的拆分功能
b, g, r = cv2.split(img)
# 合并成原图
img_mer = cv2.merge([b, g, r])
cv2.waitKey()

Effect:
insert image description here

three attribute values

  • shape, img.shape, indicates the length, width and depth of the img sequence,
  • size , img.size, indicates the number of pixels, row x column x channel
  • dtype, image data type

3. Color space and conversion

Color space, the mode of expressing color, the common color space is RGB color space, but in opencv it is the reverse BGR channel. In addition, there are GRAY, which is an eight-bit grayscale image, XYZ color space, YCrCb color space, HSV color space, HLS color space, and Bayer color space. . . . . . According to the color space of different needs, it can also be converted when needed. Here is to learn about their characteristics and mutual conversion.

Gray color space

8-bit grayscale image, corresponding to the 8-bit binary value range is 0-255, which just means 256 gray levels, 0 means pure black, 255 means pure white, and the middle value is the gradient from pure black to pure white, so it is gray Spend. In opencv, the RGB color space is transformed into a grayscale color space such as GRAY, and its transformation formula is as follows:

G r a y = 0.299 ∗ R + 0.587 ∗ G + 0.114 ∗ B Gray = 0.299*R+0.587*G+0.114*B Gray=0.299R+0.587G+0.114B

It is relatively simple to convert the gray scale of Gray to RGB color space. The value of RGB three channels is directly the value of Gray, that is,
$$R = Gray\G = Gray\B = Gray\$$

Regarding the conversion between GRAY and BGR, in fact, it can be realized when opencv reads it. I have no idea about the application for the time being. Here is mainly to play with an example.

>>> import cv2
>>> import numpy as np
>>>
>>>
>>> mong = cv2.imread("among.png")
>>> gray = cv2.cvtColor(mong, cv2.COLOR_BGR2GRAY)
>>> cv2.imshow("source", mong)
>>> cv2.imshow("gray", gray)
>>> bgr_img = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
>>> cv2.imshow("change again", bgr_img)
>>> cv2.waitKey()
-1
>>>
>>>
>>> img = np.random.randint(0, 256, size=[2,3], dtype=np.uint8)
>>> res = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
>>> res_change = cv2.cvtColor(res, cv2.COLOR_BGR2GRAY)
>>> img
array([[ 48,  27, 228],
       [ 94, 144, 234]], dtype=uint8)
>>> res
array([[[ 48,  48,  48],
        [ 27,  27,  27],
        [228, 228, 228]],

       [[ 94,  94,  94],
        [144, 144, 144],
        [234, 234, 234]]], dtype=uint8)
>>> res_change
array([[ 48,  27, 228],
       [ 94, 144, 234]], dtype=uint8)
>>>

Played off, did not switch back to the original color.
insert image description here

YCrCb color space

For the human visual system, because people's perception of color is lower than the perception of light brightness, and the RGB color space focuses on color, the indicator of brightness is missing, so there is a YCrCb color space. In this color space, Y represents the brightness of the light source, Cr represents the red component, and Cb represents the blue component. The conversion formula for converting RGB to YCrCb is:
Y = 0.299 ∗ R + 0.587 ∗ G + 0.114 ∗ BC r = ( R − Y ) × 0.713 + delta C b = ( B − Y ) × 0.564 + delta Y = 0.299* R + 0.587*G + 0.114*B \\ Cr = (RY)\times0.713+delta\\Cb=(BY)\times0.564+deltaY=0.299R+0.587G+0.114BCr=(RY)×0.713+deltaCb=(BY)×0.564+The delta value of d e lt a
has different values ​​corresponding to different digital images:

delta value Image digits
128 8 bits
32768 16 bits
0.5 single precision image
In turn, the formula for converting from YCrCb to RGB is:

R = Y + 1.403 ⋅ ( C r − d e l t a ) G = Y − 0.714 ⋅ ( c r − d e l t a ) − 0.344 ⋅ ( C b − d e l t a ) B = Y + 1.773 ⋅ ( C b − d e l t a ) R=Y+1.403\cdot(Cr-delta)\\G=Y-0.714\cdot(cr-delta)-0.344\cdot(Cb-delta)\\B=Y+1.773\cdot(Cb-delta) R=Y+1.403(Crdelta)G=Y0.714(crdelta)0.344(Cbdelta)B=Y+1.773(Cbdelta)

It is also called YUV, Y represents brightness, U and V represent chroma, and it is better than HSV in skin color detection.

HSV color space

It is said that it is a color model for visual perception. In this color space, there are three elements: hue, saturation, and brightness. Hue is the color of light, saturation is the depth of color, and brightness is the brightness of light perceived by the human eye.

  • Hue H, the six colors of red, yellow, green, blue, and red correspond to a 360-degree circle (that's why people who create these concepts like to make people puzzling);
  • Saturation S, a ratio, that is, a decimal, ranges from 0 to 1, indicating the ratio of the maximum purity of the color to the color, the saturation is 0 is gray, and the maximum is 1 is the color itself;
  • Brightness V, the brightness of the color, also has a value range of [0, 1].

The formula for converting RGB to HSV is as follows:
V = max ( R , G , B ) brightness: S = { V − min ( R , G , B ) , V ≠ 0 0 , other hues: H = { 60 ( G − B ) v − min ( R , G , B ) , V = R 120 + 60 ( B − G ) V − min ( R , G , B ) , V = G 240 + 60 ( R − G ) V − min ( R , G , B ) , V = B Hue: H = { H + 360 , H < 0 H , other V = max(R, G, B)\\brightness: S= \begin{cases} V-min(R , G, B), &V\ne0\\ 0,&other \end{cases}\\hue: H= \begin{cases} \cfrac{60(GB)}{v-min(R, G, B) },&V=R\\ 120+\cfrac{60(BG)}{V-min(R, G, B)}, &V=G\\ 240+\cfrac{60(RG)}{V-min( R, G, B)}, &V=B \end{cases}\\Hue: H= \begin{cases} H+360,&H<0\\ H,&other\end{cases}V=max(R,G,B)Brightness: S={ Vmin ( R ,G,B),0,V=0OtherHue: H= vmin ( R ,G,B)60(GB),120+Vmin ( R ,G,B)60(BG),240+Vmin ( R ,G,B)60(RG),V=RV=GV=BHue: H={ H+360,H,H<0Other
The above is the formula for converting RGB to HSV. It is really troublesome. Knowing the principle is also for the convenience of debugging. In fact, it can be done by the internal functions of opencv. You only need to care about the meaning of the three important elements above.

>>> rgb_b = np.ones((2, 3, 3), dtype=np.uint8)
>>> rgb_g = np.ones((2, 3, 3), dtype=np.uint8)
>>> rgb_r = np.ones((2, 3, 3), dtype=np.uint8)
>>> rgb_b[:, :, 0], rgb_g[:, :, 1], rgb_r[:, :, 2] = 255, 255, 255
>>> hsv_b = cv2.cvtColor(rgb_b, cv2.COLOR_BGR2HSV)
>>> hsv_g = cv2.cvtColor(rgb_g, cv2.COLOR_BGR2HSV)
>>> hsv_r = cv2.cvtColor(rgb_r, cv2.COLOR_BGR2HSV)
>>> rgb_b
array([[[255,   1,   1],
        [255,   1,   1],
        [255,   1,   1]],

       [[255,   1,   1],
        [255,   1,   1],
        [255,   1,   1]]], dtype=uint8)
>>> hsv_b
array([[[120, 254, 255],
        [120, 254, 255],
        [120, 254, 255]],

       [[120, 254, 255],
        [120, 254, 255],
        [120, 254, 255]]], dtype=uint8)
>>> rgb_g
array([[[  1, 255,   1],
        [  1, 255,   1],
        [  1, 255,   1]],

       [[  1, 255,   1],
        [  1, 255,   1],
        [  1, 255,   1]]], dtype=uint8)
>>> hsv_g
array([[[ 60, 254, 255],
        [ 60, 254, 255],
        [ 60, 254, 255]],

       [[ 60, 254, 255],
        [ 60, 254, 255],
        [ 60, 254, 255]]], dtype=uint8)
>>> rgb_r
array([[[  1,   1, 255],
        [  1,   1, 255],
        [  1,   1, 255]],

       [[  1,   1, 255],
        [  1,   1, 255],
        [  1,   1, 255]]], dtype=uint8)
>>> hsv_r
array([[[  0, 254, 255],
        [  0, 254, 255],
        [  0, 254, 255]],

       [[  0, 254, 255],
        [  0, 254, 255],
        [  0, 254, 255]]], dtype=uint8)
>>>

HLS color space

It is similar to the HSV color space, but the three elements of the HLS color space are: hue H, brightness L, and saturation S. This is how it is described in the text, so what is the difference? Words that describe brightness? Speechless.

type conversion function

The conversion between the above various color spaces and the RGB color space can be realized in the type conversion function provided by opencv.

dst = cv2.cvtColor(src, code[, dstCn])

For different type conversions, different code parameter values ​​are passed in, and dstCn is the channel number of the target image, which is 0 by default, and then the channel number will be automatically determined by the original image and code.

code value analyze
cv2.COLOR_BGR2RGB Convert BGR type to RGB type in opencv
cv2.COLOR_RGB2BGR Convert RGB type to BGR type in opencv
cv2.COLOR_BGR2GRAY BGR to GRAY
cv2.COLOR_GRAY2BGR GRAY to BGR
cv2.COLOR_BGR2XYZ BGR to XYZ
cv2.COLOR_XYZ2BGR XYZ to BGR
cv2.COLOR_BGR2YCrCb BGR to YCrCb
cv2.COLOR_YCrCb2BGR YCrCb to BGR
cv2.COLOR_BGR2HSV BGR to HSV
cv2.COLOR_HSV2BGR HSV to BGR
cv2.COLOR_BGR2HLS BGR to HLS
cv2.COLOR_BayerBG2BGR Anti-mosaic, also Bayer's BG mode

In the above parameters, there is a conversion between RGB and BGR. In opencv, the order of channels is generally BGR, which is reversed. What about the conversion of RGB? According to the results of the practical operation, the values ​​of the B and R channels will be replaced with each other, that is, the two channels are switched, but the opencv rendering picture is still in the same channel order, so if the picture is displayed, the color will change.

>>> import cv2
>>> import numpy as np
>>>
>>>
>>> img = np.random.randint(0, 256, size=(2, 3, 3), dtype=np.uint8)
>>> rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
>>> img
array([[[ 69, 184,  11],
        [193,   4, 194],
        [239, 139, 146]],

       [[188,  30,  44],
        [ 60, 145, 133],
        [ 46, 181, 139]]], dtype=uint8)
>>> rgb_img
array([[[ 11, 184,  69],
        [194,   4, 193],
        [146, 139, 239]],

       [[ 44,  30, 188],
        [133, 145,  60],
        [139, 181,  46]]], dtype=uint8)
>>> mong = cv2.imread("among.png")
>>> rgb_mong = cv2.cvtColor(mong, cv2.COLOR_BGR2RGB)
>>> cv2.imshow("source", mong)
>>> cv2.imshow("rgb_res", rgb_mong)
>>> cv2.waitKey()
-1

From the changes in the above matrix and the pictures below, we can clearly see the internal changes and macro changes. Even if the two channels are replaced, the channel order of the default picture of opencv should not change, or the internal rules of the imshow function. It is still from the channel order of bgr, which makes the color of the picture different.

insert image description here

Extract a specific color

For a specific color block in the picture, when we need it, we can iterate the picture and produce a picture that only contains the color block. For example, the color division of Doraemon above is very obvious, which is very suitable for this try. In addition, the realization of this idea depends on the inRange function of opencv.

dst = cv2.inRange(src, lowerb, upperb)

The above function is to extract the color of the area [lowerb, upperb] in the picture, but it should be noted that if it is a grayscale picture, lowerb is just an integer value, but if it is a picture in RGB color space , lowerb needs a matrix to describe the color, and upperb is the same as above. Well, but the hardware-specific feature of expressing color with three channels is really a headache for me. In HSV, there is only one way to express color, which is very appealing to users. Okay, try this.

>>> import cv2
>>> import numpy as np
>>>
>>> mong = cv2.imread("among.png")
>>> mong_hsv = cv2.cvtColor(mong, cv2.COLOR_BGR2HSV)
>>> bmin, bmax = np.array((100, 43, 46)), np.array((125, 255, 255))
>>> mask = cv2.inRange(mong_hsv, bmin, bmax)
>>> blue = cv2.bitwise_and(mong, mong, mask=mask)
>>> ymin, ymax = np.array((26, 43, 46)), np.array((34, 255, 255))
>>> ymask = cv2.inRange(mong_hsv, ymin, ymax)
>>> rmin, rmax = np.array((0, 43, 46)), np.array((10, 255, 255))
>>> rmask = cv2.inRange(mong_hsv, rmin, rmax)
>>> yellow = cv2.bitwise_and(mong, mong, mask=ymask)
>>> red = cv2.bitwise_and(mong, mong, mask=rmask)
>>> cv2.imshow("source", mong)
>>> cv2.imshow("blue", blue)
>>> cv2.imshow("yellow", yellow)
>>> cv2.imshow("red", red)
>>> cv2.waitKey()
-1

insert image description here
The above is an experiment based on the searched HSV comparison table. It should be that the colors in the picture are not completely in accordance with the regular color comparison, so what you get is also broken pieces of color blocks, which is really tiring.
However, the extraction of watermarks gave me unexpected gains in this part.

>>> import cv2
>>> import numpy as np
>>>
>>>
>>> watermark = img[850:, 580:]
>>> wm_abstract = cv2.inRange(watermark, (230, 230, 230), (255, 255, 255))
>>> cv2.imshow("source", img)
>>> cv2.imshow("watermark_part", watermark)
>>> cv2.imshow("watermark", wm_abstract)
>>> cv2.waitKey()
-1
>>> 

In this way, even if the part of the watermark is extracted, for this kind of pure color watermark, the inRange of RGB is more usable.
insert image description here

Skin tone marking and detection

Consistent with the color range determined above, mark the color range of the skin color corresponding to the human photo, and then you can get the corresponding area, and then extract it. The book uses HSV for it, but there is information that YCrCb is better for this scene. I See if you can try both.

In the book, the hue and saturation of the skin color are set between [5, 170] and [25, 166]. I am not sure if it is for the example pictures in the book, but I will use them first, although I use other pictures for experiments.

>>> import cv2
>>> import numpy as np
>>>
>>>
>>> img = cv2.imread("eason.png")
>>> hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
>>> h, s, v = cv2.split(hsv)
>>> hmask = cv2.inRange(h, 5, 170)
>>> smask = cv2.inRange(s, 25, 166)
>>> mask = hmask & smask
>>> roi = cv2.bitwise_and(img, img, mask=mask)
>>> cv2.imshow("source", img)
>>> cv2.imshow("skin", roi)
>>> cv2.waitKey()
-1

insert image description here
As far as the results are concerned, the detection of skin color is still possible. Next, try the detection of YCrCb. From the above effect, in fact, the edge part is not processed well, so the high-pass filter, one of the four blurring techniques of opencv, is used below.

# 引入部分和上面的一致
>>> ycrcb = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb)
>>> y, cr, cb = cv2.split(ycrcb)
>>> cr = cv2.GaussianBlur(cr, (5, 5), 0)
>>> _, skin = cv2.threshold(cr, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
>>> cv2.imshow("res", skin)
>>> cv2.imshow("source", img)
>>> cv2.waitKey()
-1

insert image description here
The effect is quite good, but I can only copy the example, and I can't get a colored picture.

alpha channel

On the basis of the RGB color space, the A channel is added to indicate transparency or opacity. This is the RGBA color space. The value range of the A channel can be [0, 1] or [0, 255]. In common image processing, there are generally three RGB channels, so to use the RGBA color space, cvtColor is required for conversion. Here is just a record, not an expansion.

4. Image operation

It is said, ah, it is said that image addition and bit operations are used for image bit plane decomposition, image XOR encryption, digital watermarking, face encoding/decoding, etc. If you want to reverse, it is not clear for the time being. The processing of this part is temporarily demonstrated by python's interactive programming.

Addition

There are two main types, one is the addition led by the simple + operator, and the other is the add addition function provided by opencv. Describe the rules:

  • Plus operator, when image a and image b use simple plus operator to perform plus operation, once their result exceeds the maximum value of grayscale 255, take this value modulo 256, and the value of addition is this If the remainder is not exceeded, it will be operated normally.
    a + b { a + b , a + b ≤ 255 mod ( a + b , 256 ) , a + b > 255 a+b \begin{cases} a+b,&a+b\le255\\ mod(a+ b, 256),&a+b>255 \end{cases}a+b{ a+b,mod(a+b,256),a+b255a+b>255
  • cv2.add(a, b), image a and image b are added using this function, which is the same as the addition dominated by the plus operator above, and there are also two results. The first is when the sum of the two is greater than 255 , then let it be reserved as the maximum value, 255 is the saturation value, capped; if it is not greater than 255, it will be added normally.
    cv 2. add ( a , b ) { a + b , a + b ≤ 255 255 , a + b > 255 cv2. add(a, b) \begin{cases} a+b, &a+b\le255\\ 255, &a+b>255 \end{cases}cv2.add(a,b){ a+b,255,a+b255a+b>255

Let's take a simple example

import cv2
import numpy as np

# 灰度图模式读取
img = cv2.imread('among.png', 1)
# 加法处理
a = img + img
b = a + a
c = cv2.add(img, img)
c = cv2.add(c, c)

cv2.imshow("a", a)
cv2.imshow("b", b)
cv2.imshow("c", c)
cv2.imshow("d", d)
cv2.waitKey()

The results of the operation are as follows:
insert image description here
a and b are image addition operations using the addition operator, and c and d are addition operations using the add function of cv2. It can be clearly seen that the former has a sense of line in the further and further addition operations It is more obvious that the colored ones are getting darker and darker, while the latter are the other way around, becoming more saturated and appearing whiter and whiter. This is a simple processing of images, but if it is between two images or even between values ​​and images, it is not clear, because there is no reference.

weighted sum

The weighted sum is to take the elements of the image weight into account when calculating the sum of the pixel values ​​of the two images. The two weighted sum images need to have the same size and type, but there is no requirement for the channel and specific type. However, the blog information I found is basically scripted, and there is no statement suitable for human reference, mdzz. For the time being, let’s copy the concept from the book here, and use the formula:
dst = saturate ( src 1 × α + src 2 × β + γ ) dst = saturate(src1\times\alpha + src2\times\beta+\gamma)dst=s a t u r a t e ( src 1×a+src2 _×b+γ )
The realization of this weighted sum is the function addWeighted in opencv. Corresponding to the above formula, it also needs to pass in five parameters: src1, alpha, src2, beta, gama. The so-called weight is the difference between the two images in my opinion. Proportion, who is more obvious in the results of the final image, so the above can be understood as: result image = image 1 x coefficient 1 + image 2 x coefficient 2 + brightness adjustment amount.

Simple image blending example:

import cv2
import numpy as np

a = cv2.imread('blena.png')
b = cv2.imread('bboat.png')
c = cv2.addWeighted(a, 0.2, b, 0,8, 0)
d = cv2.addWeighted(a, 0.5, b, 0,5, 0)
e = cv2.addWeighted(a, 0.8, b, 0,2, 0)
cv2.imshow("c", c)
cv2.imshow("d", d)
cv2.imshow("e", e)
cv2.waitKey()

Because I really don’t have any resources, I had to take screenshots of two pictures in Baidu books and save them for experimentation, and then display them according to the increasing proportion of the face image. The results are as follows: The results are obvious
insert image description here
. The proportion is getting bigger and bigger, and the face of the picture is getting more and more obvious. This is an example of mixed images. I heard that if the images are of different sizes, you can use resize to adjust them. Try again. After changing the picture, the code is adjusted to:

import cv2
import numpy as np

a = cv2.imread("ayanamirei.jpg")
b = cv2.imread("asikaj.jpg")

# 比例调整大小,本来size并不一样,但调整后以外发现一致了,应该是算法问题
a = cv2.resize(a, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_LINEAR)
b = cv2.resize(b, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_LINEAR)

c = cv2.addWeighted(a, 0.2, b, 0.8, 0)
d = cv2.addWeighted(a, 0.5, b, 0.5, 0)
e = cv2.addWeighted(a, 0.8, b, 0.2, 0)
cv2.imshow("c", c)
cv2.imshow("d", d)
cv2.imshow("e", e)
cv2.waitKey()

The adjustments worked out fine, and then I was able to do blended image processing again. The obtained color map is as follows:
insert image description here
Relatively speaking, the ratio of 0.4 and 0.3 should be better. When the background is not so obvious, it should be better to clear the bottom. The choice of the last parameter of resize has a great influence on the drawing:

  • INTER_NEAREST: nearest neighbor interpolation
  • INTER_LINEAR: linear interpolation (default)
  • INTER_AREA: Area interpolation
  • INTER_CUBIC: cubic spline interpolation
  • INTER_LANCZOS4: Lanczos interpolation
    cv2.INTER_AREA is recommended for zooming out; cv2.INTER_CUBIC and cv2.INTER_LINEAR are recommended for zooming in

bitwise logical operation

Regarding logical operations, there are operations such as AND or NOT, XOR, etc. The bitwise logical operation here is to convert a number into a binary number, and perform logical operations on each corresponding bit number. It is not necessary to expand the specific logical operations, and will not be recorded. The bitwise logical operation functions provided by opencv are:

  • Bitwise and, dst = cv2.bitwise_and(src1, src2[, mask])
  • Bitwise or, dst = cv2.bitwise_or(src1, src2[, mask])
  • Bitwise not, dst = cv2.bitwise_not(src[, mask])
  • Bitwise XOR, dst = cv2.bitwise_xor(src1, src2[, mask])
    mask, optional operation mask, 8-bit single-channel array value

Use of bitwise AND operations

For bitwise AND operations, when processing grayscale images, the pixel is bitwise ANDed with the value 0, and only 0 is obtained, and the bitwise AND with 255 is the original value. When an image with a large number of 0 values ​​and 255 values Instead of bitwise AND, what you get is a partially "blackened" image, as if it was cut out. To put it bluntly, another image is used to partially or even fully cover the target image.

import cv2
import numpy as np

img = cv2.imread("blena.png")
# 制作一个和原图同尺寸的数组
mask = np.zeros(img.shape, dtype=np.uint8)

# 固定区域设置纯白
mask[100:280, 100:250] = 255
a = cv2.bitwise_and(img, mask)

# 三图进行显示
cv2.imshow("source", img)
cv2.imshow("mask", mask)
cv2.imshow("res", a)
cv2.waitKey()

insert image description here

Well, except for the application of bitwise AND operations, the other two or three seem to have no applications for the time being. However, it should be noted that the two matrices for bitwise logic operations should be of the same size, otherwise errors will occur. So there are a lot of uses that will adjust the size of the picture.

Decomposition of bit planes

The color image can be split into three matrices according to the RGB three channels, which is the disassembly of the channels; the image obtained by combining the pixels on the same bit of the image is also called a bit plane, and this process is called a bit plane decomposition. In the grayscale image, the value of a pixel ranges from 0 to 255, which is a range of 8 bits in a byte. The value of each bit is extracted to obtain a bit plane, plus the original image, there will be a total of 9 images . If the color image is divided into bit planes, it would be too much. So the example here is just a grayscale image, which is easier to do.

For a grayscale image, when it is expanded into a binary number and the bit plane is cut, the higher the weight of the bit where the bit plane is, the higher the correlation between the bit plane and the original image. Relatively, the higher the weight of the bit where the bit plane is The lower the corresponding, the lower the correlation with the original image. To put it bluntly, the bit plane cut out by the digits corresponding to the 2 to the 0th power is less visible from the original image, and the bit plane cut out by the digits corresponding to the 2 to the 7th power is more similar to the original image.

For RGB color images, it is split into three channels, and the color corresponding to the three channels is also an 8-bit binary number. Split them into three channels and then synchronize the corresponding digits to cut out a channel-based bit plane, and then combine them, which is the original image bit plane. Well, it seems that the bit plane segmentation of the color map is not too difficult.

Bit plane segmentation steps:

  • Extract the width and height of the original image, and construct a matrix of the same size;
  • Construct the above matrix into a pixel whose value is 2 n 2^n2The matrix of n is used for extraction;
  • Perform a bitwise AND operation on the extracted matrix and the original image to obtain the bit plane
  • In order to keep the bit planes corresponding to smaller digits from being displayed as pure black, it needs to be thresholded, so that the final result is only black and white values ​​such as 0 and 255, or it is either true or false.
import cv2
import numpy as np

img = cv2.imread("alian.jpg", 0)
img = cv2.resize(img, None, fx=0.4, fy=0.4, interpolation=cv2.INTER_LINEAR)
cv2.imshow("source", img)
w, h = img.shape
# 创建8层同规模的矩阵,每个矩阵用来放置对应数位的提取矩阵,在后面的循环中给对应矩阵赋值
arrays = np.zeros((w, h, 8), dtype=np.uint8)
for i in range(8):
    x[:, :, i] = 2**i

# 循环对原图进行按位与运算提取位平面,然后进行阈值处理,最后输出图像
for i in range(8):
    temp = cv2.bitwise_and(img, x[:, :, i])
    # 将temp中大于0的值转为True,除此以外的值转换为False
    mask = temp>0
    # 将temp中True换成255
    temp[mask] = 255
    cv2.imshow("res"+str(i+1), temp)
cv2.waitKey()

insert image description here
insert image description here
insert image description here
It can be clearly seen from the above, about the bit plane extraction and display of a grayscale image (Alian's avatar is used, unfortunately it is too white, and the body is too large, so the size needs to be modified), so if you want to To process the color image, add the color image in the above process to split the three channels and finally synthesize the three channels. as follows:

import cv2
import numpy as np

img = cv2.imread("alian.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4, interpolation=cv2.INTER_LINEAR)
cv2.imshow("source", img)
w, h = img.shape[:2]
b, g, r = cv2.split(img)
b_arr = np.zeros((w, h, 8), dtype=np.uint8)
g_arr = np.zeros((w, h, 8), dtype=np.uint8)
r_arr = np.zeros((w, h, 8), dtype=np.uint8)
for i in range(8):
    b_arr[:, :, i], g_arr[:, :, i], r_arr[:, :, i] = 2**i, 2**i, 2**i

for i in range(8):
    t1 = cv2.bitwise_and(b, b_arr[:, :, i])
    t2 = cv2.bitwise_and(g, g_arr[:, :, i])
    t3 = cv2.bitwise_and(r, r_arr[:, :, i])
    mask1 = t1 >0
    mask2 = t2 >0
    mask3 = t3 >0
    t1[mask1], t2[mask2], t3[mask3] = 255, 255, 255
    temp = cv2.merge([t1, t2, t3])
    cv2.imshow("res"+str(i+1), temp)
cv2.waitKey()

insert image description here
insert image description here
insert image description here

watermark

All major websites are very concerned about their own copyright issues, and they can’t wait to put various imprints on them. The watermark on the image is a manifestation. Sometimes you upload it, and when it is displayed on the website, there will be other people’s imprints, which is very disgusting. . However, for private individuals, this is also part of the intellectual property rights to be protected, and it is a matter of public opinion.

It was introduced above that a bit plane is a collection of pixels of an image based on the digits of the same binary. The larger the digit, the closer the corresponding bit plane image fits the original image, and the smaller the digit, the greater the difference from the original image. Therefore, the lowest bit in the binary number is the 0th position of 2, also called the least significant bit (LSB, Least Significant Bit) . When the information is stored in this bit and combined into the original image, this information becomes hidden. Information, watermark belongs to this kind of hidden information. How?

The first is to read the picture that needs to be watermarked and a picture with a clear watermark, extract the least significant bit corresponding to the bit plane of the latter, and then make appropriate or zoom-in or zoom-out adjustments according to the size of the picture, and then paste it on the picture that needs to be watermarked , this step can use the weighted sum function of cv2 or the paste function of PIL.

Self-made watermark map

First of all, because I don't have the original image of the watermark, I want to generate two original images of the watermark with my own label, one with white text on black background and two with black text on white background. It is very simple to generate black and white background images. A pixel value of 0 is pure black, and a pixel value of 255 is pure white. As for adding text, you can use the putText function that comes with cv2. Come on, start building.

import cv2
import numpy as np

# 制造黑色和白色背景
black_w = np.zeros((300, 450), dtype=np.uint8)
white_b = np.ones((300, 450), dtype=np.uint8)*255

# 调用putText函数添加手写体的标签,距离左上角150的位置,字体大小为3,粗细为3
cv2.putText(black_w, 'JackSAMA', (0, 150), cv2.FONT_HERSHEY_SCRIPT_SIMPLEX, 3, 255, 2)
cv2.putText(white_b, 'JackSAMA', (0, 150), cv2.FONT_HERSHEY_SCRIPT_SIMPLEX, 3, 0, 2)

cv2.imshow("black", black_w)
cv2.imshow("white", white_b)
cv2.waitKey()

insert image description here
The watermark image is ready. In fact, it can be generated according to the actual situation. As long as the watermark function is customized, it can also be adjusted according to the size of the image. Of course, it can also be adjusted synchronously if there is a real image. The function parameters of putText are mainly as follows:

img = cv2.putText(img, text, org, fontFace, fontScale, color[, thickness[, lineType[, bottomLeftOrigin]]])
img, 操作图像对象
text,添加的文本,一般都是英文,中文使用会乱码,暂时也还没解决
fontFace,用过标签语言的应该都知道这是字体类型的意思
fontScale,字体大小
color,对于灰度图,简单的0-255表示即可,如果是rgb彩图就要适用(b, g, r)进行表示
thickness,线条粗细,默认是1
lineType,线条类型,默认是8连接类型
bottomLeftOrigin,默认为False,这样文本就是横着来;输入为True就是文本竖着来
fontFace analyze
cv2.FONT_HERSHEY_SIMPLEX Normal sans-serif fonts are commonly used English fonts
cv2.FONT_HERSHEY_PLAIN small sans-serif font
cv2.FONT_HERSHEY_DUPLEX normal size sans-serif
cv2.FONT_HERSHEY_COMPLEX normal serif font
cv2.FONT_HERSHEY_TRIPLEX Normal sized serif fonts
cv2.FONT_HERSHEY_COMPLEX_SMALL serif simplified version
cv2.FONT_HERSHEY_SCRIPT_SIMPLEX handwriting style font
cv2.FONT_HERSHEY_SCRIPT_COMPLEX Handwritten font complex version
cv2.FONT_ITALIC italic mark
lineType analyze
cv2.FILLED filled type
cv2.LINE_4 4 connection type
cv2.LINE_8 8 connection type
cv2.LINE_AA Anti-aliasing for smoother lines

The above are some interpretations of font parameters and line types in opencv. I heard that it can be replaced with a self-designed library, but I don’t know how. But it seems that adding an anti-aliasing line parameter looks much more comfortable.

Embed watermark
Add watermark to Doraemon's picture.

>>> import cv2
>>> import numpy as np
>>>
>>> mong = cv2.imread("among.png")
>>> mong.shape
(347, 272, 3)
>>> watermark = cv2.imread("white_black_sign.png")
>>> watermark.shape
(300, 450, 3)
# 调整水印原图大小和对应待处理图片补充空白
>>> watermark = cv2.resize(watermark, None, fx=0.3, fy=0.3, interpolation=cv2.INTER_AREA)
>>> cv2.imshow("", watermark)
>>> cv2.waitKey()
-1
>>> watermark.shape
(90, 135, 3)
>>> temp = np.ones(mong.shape, dtype=np.uint8)*255
>>> temp[210:300, 137:272] = watermark
>>> cv2.imshow("", temp)
>>> cv2.waitKey()
-1
# 进行加权和拼接,实现图片添加水印
>>> res1 = cv2.addWeighted(mong, 0.9, temp, 0.1, 0)
>>> cv2.imshow("", res1)
>>> cv2.waitKey()
-1

insert image description here

remove watermark

Many ways to remove the watermark are to ask the artist to create a solid color watermark image like the one I generated above, and then identify the color range, and then merge the two images, that is, paste it, so there are many uses. Using the PIL library

import cv2
import PIL import Image
import numpy as np

img = cv2.imread("./iamfine.png")
h, w, _ = img.shape[0:3]
#切割,根据实际水印位置而定,[y0:y1, x0:x1]为下面裁剪用法,裁剪完后可以用上面的方法输出查看一下
cropped = img[int(h*0.9):h, int(w*0.75):w]
# 对图片进行阈值化处理,把由后面两个参数划定的RGB色彩空间范围外的色彩输出为0或者255,由图片底色确定这个范围
thresh = cv2.inRange(cropped, np.array([230, 230, 230]), np.array([250, 250, 250]))
#创建结构和尺寸的数据元素
kernel = np.ones((3, 3), np.uint8)
# 扩展待修复区域
watermask = cv2.dilate(thresh, kernel, iterations=10)
specular = cv2.inpaint(cropped, watermask, 5, flags=cv2.INPAINT_TELEA)
#保存去除水印的残图
cv2.imwrite("new.png", specular)

# 用PIL的paste函数把残图粘贴在原图上得到新图
i = Image.open("./img/iamfine.png")
i2 = Image.open("./img/new.png")
i2.paste(i, (int(w*0.75), int(h*0.9), w, h))
i2.save("final.png")

Comparison of the two pictures:
|
Actually, you can use the method of adding a watermark above to invert it.

Generate character pictures

A picture is made up of pixels, and computers store pictures in binary, and the bits used to store pixels are the depth of the picture, which is stored in 1 bit. The picture is either black or white, because it There are only 0 and 1 options; use one byte (8bit) to store the value of 0-255. The color has three primary colors: red, green and blue. These three colors can be overlapped together to represent other colors, and the color of the pixels of this picture is determined by the three colors of RGB, which are generally called three channels, because they use three characters respectively. The sections respectively represent the individual values ​​of the three primary colors, and when stacked together, it is the color of the pixel, which is usually (0-255, 0-255, 0-255) in pixel representation.


The character picture we want to generate is actually to establish a mapping from pixels to characters, because the character set is also very large, even the most basic ASCII set is 128 characters, but we don’t need to use so many , we can use some simple characters to make up a character set, and then pair it with the color table of the pixel (because the mapping rules are customized).


The truth is this, but I didn’t understand other people’s examples, but another method has gained me a lot, that is, first convert the image into a grayscale image, that is a black and white image, at this time, it will be easy to convert pixels into characters a lot of.
The simple operation is as follows:

import cv2
import numpy as np

str = "#+-."
img = cv2.imread("among.png", 0)
# 此时就只有height和width两个值,没有depth
h, w = img.shape[0:2]
for_change = np.ndarray([h, w])
font = cv2.FONT_HERSHEY_SIMPLEX
for i in range(0, h, 5):
    for j in range(0, w, 5):
        t = str[round(3-img[i, j]/255*3)]
        cv2.putText(for_change, t, (j, i), font, 0.1, color=(255, 255, 255))
        
cv2.imshow("", for_change)
cv2.waitKey(0)
cv2.imwrite("asciiPic.png", for_change)


Well, at least it is realized, but there is a problem with this, that is, even the base map is replaced together, and then it is a bit eye-catching.

generate character video

Because a video is a frame-by-frame picture, generating a character picture is half the way to generate a character video, and the above logic can be iterated to generate a character video.

import cv2
import numpy as np


def pixel2char(pixel):
    char_list = "@#$%&erytuioplkszxcv=+---.     "
    index = int(pixel / 256 * len(char_list))
    return char_list[index]


def get_char_img(img, scale=4, font_size=5):
    # 调整图片大小
    h, w = img.shape
    re_im = cv2.resize(img, (w//scale, h//scale))
    # 创建一张图片用来填充字符
    char_img = np.ones((h//scale*font_size, w//scale*font_size), dtype=np.uint8)*255
    font = cv2.FONT_HERSHEY_SIMPLEX
    # 遍历图片像素
    for y in range(0, re_im.shape[0]):
        for x in range(0, re_im.shape[1]):
            char_pixel = pixel2char(re_im[y][x])
            cv2.putText(char_img, char_pixel, (x*font_size, y*font_size), font, 0.5, (0, 0, 0))
    return char_img


def generate(input_video, output_video):
    # 1、读取视频
    cap = cv2.VideoCapture(input_video)

    # 2、获取视频帧率
    fps = cap.get(cv2.CAP_PROP_FPS)

    # 读取第一帧,获取转换成字符后的图片的尺寸
    ret, frame = cap.read()
    char_img = get_char_img(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY), 4)

    # 创建一个VideoWriter,用于保存视频
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    writer = cv2.VideoWriter(output_video, fourcc, fps, (char_img.shape[1], char_img.shape[0]))
    while ret:
        # 读取视频的当前帧,如果没有则跳出循环
        ret, frame = cap.read()
        if not ret:
            break
        # 将当前帧转换成字符图
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        char_img = get_char_img(gray, 4)

        # 转换成BGR模式,便于写入视频
        char_img = cv2.cvtColor(char_img, cv2.COLOR_GRAY2BGR)
        writer.write(char_img)
    writer.release()


if __name__ == '__main__':
    generate('in.mp4', 'out.mp4')

Well, this build. . . . . . I can't stand it, the computer starts to smoke before the generation is finished! ! ! Master Lu is smoking! ! ! After disconnecting, the video is not half the length of the original video, but the size is several times that of the original video! ! ! Let's optimize it later, if you can't milk, you can't milk.

The picture used above

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
There is also a piece of Rei Ayanami's, but it is too large to pass on. Let's do this first, and record it later. There is also a video: Well, it seems so difficult to get this resource.

I recently launched a personal website, and various articles are ready to be gathered there. If you are interested, you can check it out:

small broken station

Guess you like

Origin blog.csdn.net/weixin_44948269/article/details/128150084