foreword

This article mainly introduces some APIs of mmcv that are often used in mmdet.

1. Pre-basic knowledge

mmcv contains a large number of image processing functions, and the two most commonly used libraries are cv2 and pillow. Therefore, a brief introduction to the commonly used APIs of these two libraries is given.

1.1. Read image

import cv2
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

# h>w的图像: (1133, 800, 3)
img_path = '/home/wujian/mmdet-lap/data/coco/val2017/000001.jpg'

img = cv2.imread(img_path)
h,w = img.shape[:2]
print('h:', h, 'w:',w)

img = Image.open(img_path)
w,h = img.size
print('w:', w, 'h:',h)

Note that cv2 returns the imageh and w, and what pil returns is the image'sw and h！！

1.2. Mutual conversion between cv2 and pil

import cv2
import numpy as np
from PIL import Image
# cv2 --> pil
img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
# pil --> cv2
image = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)

1.3. Convert to pil for visualization

Generally, coding is performed in the IDE, so it is more convenient to convert to PIL for visualization. Paste the code of the visualized pil image:

from PIL import Image
import matplotlib.pyplot as plt

img = open(img_path)
plt.imshow(img)
plt.show()

1.4. cv2 and pil save images

Just pay attention to save the absolute path.

cv2.imwrite('abs_path', img) # img是经cv2.imread读取的
img.save('abs_path')      # img 是经 Image.open()读取的

2、mmcv

Here are the data set processing fields commonly used in mmdet:

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]

2.1. Transform image size Resize

Two images with h>w and w>h were respectively selected for Resize transformation. The transformation operation in mmdet is to make the side with a smaller ratio become the specified side, and then scale the other side. Of course, the size order of h and w of the original image does not change after the transformation is completed.

from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from mmcv.image import imrescale

# h>w的图像: (1133, 800, 3)，可视化第一张图像
img_path = '/home/wujian/mmdet-lap/data/coco/val2017/000001.jpg'
img = cv2.imread(img_path)
h,w = img.shape[:2]

img, new_scale = imrescale(img, scale=(1333,800), return_scale= True)
print(img.shape)
# cv2 --> pil
img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.imshow(img)
plt.show()

# w>h的图像 :(800, 1067, 3)， 可视化第二张图像
img_path = '/home/wujian/mmdet-lap/data/coco/val2017/000003.jpg'
img = cv2.imread(img_path)
h,w = img.shape[:2]

img, new_scale = imrescale(img, scale=(1333,800), return_scale= True)
print(img.shape)
# cv2 --> pil
img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

plt.imshow(img)
plt.show()

insert image description here

2.2. Filling the image

On the basis of Resize, the Pad operation is to fill the width and height so that its two sides become a multiple of 32. Paste the total code:

import cv2
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from mmcv.image import imrescale

# h>w的图像: (1133, 800, 3)
img_path = '/home/wujian/mmdet-lap/data/coco/val2017/000001.jpg'
img = cv2.imread(img_path)
h,w = img.shape[:2]

img, new_scale = imrescale(img, scale=(1333,800), return_scale= True)
print(img.shape)
# cv2 --> pil
img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.imshow(img)
plt.show()

#pad
from mmcv.image import impad_to_multiple
import numpy as np
# pil --> cv2
image = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
pad_img = impad_to_multiple(image, divisor=32, pad_val= 0)
print(pad_img.shape)
pad_img = Image.fromarray(cv2.cvtColor(pad_img, cv2.COLOR_BGR2RGB))
plt.imshow(pad_img)
plt.show()


# w>h的图像 :(800, 1067, 3)
img_path = '/home/wujian/mmdet-lap/data/coco/val2017/000003.jpg'
img = cv2.imread(img_path)
h,w = img.shape[:2]

img, new_scale = imrescale(img, scale=(1333,800), return_scale= True)
print(img.shape)
# cv2 --> pil
img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

plt.imshow(img)
plt.show()

#pad
from mmcv.image import impad_to_multiple
import numpy as np
# pil --> cv2
image = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
pad_img = impad_to_multiple(image, divisor=32, pad_val= 0)
print(pad_img.shape)
pad_img = Image.fromarray(cv2.cvtColor(pad_img, cv2.COLOR_BGR2RGB))
plt.imshow(pad_img)
plt.show()

insert image description here

2.3. Image left and right flip changes

Summarize

&emps; I will add and explain the source code of this part of mmcv when I have time later.

mmcv common API introduction

Article directory