Can yolov5 be trained with grayscale images? Start yolov5 grayscale image training and detection from 0
Article directory
- Can yolov5 be trained with grayscale images? Start yolov5 grayscale image training and detection from 0
-
- @[toc]
- 1 preview
-
- 2 Modify the source code to enable grayscale training
-
- 2.1 Modify the image reading mode
- 2.2 Modify the number of channels in the source code parameter transmission
- 2.3 Run train.py
- 2.4 Modify the source code in utils/general.py
- 2.5 Modify the flags of all cv2 reading images in dataloaders.py
- 2.6 At this time, run the 'train.py' file to run the epoch normally.
- 3 Model testing
-
- 4 Conclusion
- 5 Test RGB to Gray conversion time
- Attachment 1: No source code has been modified, grayscale images + yolov6_6.2 are used directly for training
- Appendix 2 The data set labels during training are as follows
Article directory
- Can yolov5 be trained with grayscale images? Start yolov5 grayscale image training and detection from 0
-
- @[toc]
- 1 preview
- 2 Modify the source code to enable grayscale training
-
- 2.1 Modify the image reading mode
- 2.2 Modify the number of channels in the source code parameter transmission
- 2.3 Run train.py
- 2.4 Modify the source code in utils/general.py
- 2.5 Modify the flags of all cv2 reading images in dataloaders.py
- 2.6 At this time, run the 'train.py' file to run the epoch normally.
- 3 Model testing
- 4 Conclusion
- 5 Test RGB to Gray conversion time
- Attachment 1: No source code has been modified, grayscale images + yolov6_6.2 are used directly for training
- Appendix 2 The data set labels during training are as follows
1 preview
Usually we use RGB images for target detection. When considering multiple video streams for target detection at the same time, it consumes a lot of graphics card computing power and inference time. Now through experimental comparison, grayscale images (GRAY) and RGB images are compared for training and testing.
This time, yolov5_6.2 is used to try grayscale images for target detection training and target detection testing.
Directly convert the color image into a grayscale image through the opencv algorithm and run 'python train.py' for training.
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
When train.py is run, no error is reported, but the first two lines of the printed model structure are as follows:
[Table 1-1 Model structure interception]
from | n | params | module | arguments | |
---|---|---|---|---|---|
0 | -1 | 1 | 3520 | models.common.Conv | [3, 32, 6, 2, 2] |
1 | -1 | 1 | 18560 | models.common.Conv | [32, 64, 3, 2] |
From the above figure, we can see that the input of the model is fixed 3 channels, which means that even if the training data set is a grayscale image, the image will still be processed when reading the image.
Observe the source code, line 248 of the utils/dataloaders.py file
245 else:
246 # Read image
247 self.count += 1
248 img0 = cv2.imread(path) # BGR
249 assert img0 is not None, f'Image Not Found {path}'
250 s = f'image {self.count}/{self.nf} {path}: '
The following is an introduction to the imread method of opencv
img = cv2.imread(path, flag) # flag: the way in which img should be read
# flag: default value is cv2.IMREAD_COLOR
# cv2.IMREAD_COLOR(1): BGR
# cv2.IMREAD_GRAYSCALE(0): GRAY
# cv2.IMREAD_UNCHANGED(-1):
By default, opencv, 'cv2.imread(path)' will read 3-channel (BGR) images. If it is a grayscale image, the layer will be copied three times (BGR default, BGR), so the read image is three-channel.
2 Modify the source code to enable grayscale training
The location of the source code change is followed by the comment 'Hlj2308'.
2.1 Modify the image reading mode
In line 248 of the utils/dataloaders.py file, the original
img0 = cv2.imread(path) # BGR
Change to
img0 = cv2.imread(path, 0) # Hlj2308_GRAY
2.2 Modify the number of channels in the source code parameter transmission
2.2.1 Change the parameter ch=3 in lines 122 and 130 of the train.py file to ch=1.
model = Model(cfg or ckpt['model'].yaml, ch=1, nc=nc, anchors=hyp.get('anchors')).to(device) # Hlj2308
model = Model(cfg, ch=1, nc=nc, anchors=hyp.get('anchors')).to(device) # Hlj2308 create
2.2.2 Change ch=3 to ch=1 in line 151 of the models/yolo.py file.
149 class DetectionModel(BaseModel):
150 # YOLOv5 detection model
151 def __init__(self, cfg='yolov5s.yaml', ch=1, nc=None, anchors=None):
152 super().__init__()
153 if isinstance(cfg, dict):
154 self.yaml = cfg # model dict
2.3 Run train.py
In the process of running train.py, errors are reported all the way, and errors are corrected and debugged all the way. Finally, normal training and testing can be achieved.
[Error report 1]
File "....packages\torch\nn\modules\conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [32, 1, 6, 6], expected input[8, 3, 640, 640] to have 1 channels, but got 3 channels instead
The number of channels the data is read into is wrong. It should be 1 channel, but 3 channels are read here.
It may be a 'mode' problem when the PIL module 'Image.open()' opens the image. Modify line 612 in models/common.py file
im, f = Image.open(requests.get(im, stream=True).raw if str(im).startswith('http') else im), im
Add the following two lines of code later
if im.mode != 'L':
im = im.convert('L')
Still exists【Error 1】
2.4 Modify the source code in utils/general.py
Modify lines 1031-1032 of the source code in utils/general.py as follows
def imread(path, flags=cv2.IMREAD_COLOR):
return cv2.imdecode(np.fromfile(path, np.uint8), flags)
Change to
def imread(path, flags=cv2.IMREAD_GRAYSCALE):
return cv2.imdecode(np.fromfile(path, np.uint8), cv2.IMREAD_GRAYSCALE)
Still exists【Error 1】
2.5 Modify the flags of all cv2 reading images in dataloaders.py
In lines 677, 691, 881, 1042, 1119, 1122, and 1125 of the utils/dataloaders.py file, the original
cv2.imread(f)
Change to the following, you can search and replace through Ctr+F
cv2.imread(f, 0)
At this point, run train.py and [Error 1] will not exist.
[Error 2] is as follows
File "..../yolov5_6.2_gray/utils/dataloaders.py", line 706, in load_mosaic
img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8) # base image with 4 tiles
IndexError: tuple index out of range
Solution: The reason is that the img.shape[2] index does not exist, because the image is a single-channel, so it is only a two-dimensional array, not a three-dimensional array. Here is the 706 lines of source code \utils\dataloaders.py
img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)
Change to
img4 = np.full((s * 2, s * 2, img.shape[2] if len(img.shape)==3 else 1), 114, dtype=np.uint8)
[Error 3] is as follows, it is due to the incorrect modification method of [Error 2]
File "..../yolov5_6.2_gray/utils/dataloaders.py", line 721, in load_mosaic
img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b] # img4[ymin:ymax, xmin:xmax]
ValueError: could not broadcast input array from shape (360,640) into shape (360,640,1)
Solution: The printing error corresponds to the shape of img4 and img, including 'img4.shape, img.shape = (1280, 1280, 1) (360, 640)', which is obviously caused by the mismatch in image size. Modify [Error 2] as follows
img4 = np.full((s * 2, s * 2), 114, dtype=np.uint8) # base image with 4 tiles
[Error 4] is as follows
File "D:\yolov5train\yolov5_6.2_grayTrain\utils\dataloaders.py", line 642, in __getitem__
augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])
File "D:\yolov5train\yolov5_6.2_grayTrain\utils\augmentations.py", line 69, in augment_hsv
hue, sat, val = cv2.split(cv2.cvtColor(im, cv2.COLOR_BGR2HSV))
cv2.error: OpenCV(4.2.0) c:\projects\opencv-python\opencv\modules\imgproc\src\color.simd_helpers.hpp:92: error: (-2:Unspecified error) in function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<3,4,-1>,struct cv::impl::A0x3b52564f::Set<3,-1,-1>,struct cv::impl::A0x3b52564f::Set<0,5,-1>,2>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)'
> Invalid number of channels in input image:
> 'VScn::contains(scn)'
> where
> 'scn' is 1
Solution: The above problem is caused by hsv channel splitting. It is not needed here. Comment out the code corresponding to line 642 in \utils\dataloaders.py, as follows
# augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])
[Error 5] is as follows
File "D:\yolov5train\yolov5_6.2_grayTrain\utils\dataloaders.py", line 665, in __getitem__
img = img.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
ValueError: axes don't match array
Solution: The above problem is guessed to be caused by BGR to RGB conversion. Replace the 665 lines of source code in \utils\dataloaders.py
img = img.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
It is incorrect to change it to the following
img = img.transpose((2, 0, 1)) # HWC to CHW
opencv loads images through HWC, and Pytorch requires CHW, so the image needs to be converted to CHW through 'transpose((2, 0, 1)) '. However, the size of the 'img' that needs to be 'transposed' at this time is (640, 640), and the 'transpose' operation cannot be performed directly. Therefore, it can be avoided by changing it to [Error 5] as follows.
img = img.reshape(1, img.shape[0], img.shape[1])
2.6 At this time, run the 'train.py' file to run the epoch normally.
Modification of the model cfg file is not involved at this time. The parameters in train.py are set as follows
'--weights', type=str, default='',
'--cfg', type=str, default='./models/yolov5s.yaml',
'--data', type=str, default= './data/my_yolo5.yaml',
'--epochs', type=int, default=50
'--batch-size', type=int, default=8,
'--imgsz', '--img', '--img-size', type=int, default= 640,
'./models/yolov5s.yaml' parameters include the following
# Parameters
nc: 11 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
'./data/my_yolo5.yaml' parameters include the following
train: ../datasTrain4_LQTest_gray/images/train/
val: ../datasTrain4_LQTest_gray/images/val/
# number of classes
nc: 11
# class names
names: [ 'pedes', 'car', 'bus', 'truck', 'bike', 'elec', 'tricycle','coni', 'warm', 'tralight', 'speVeh']
3 Model testing
This time both RGB and GRAY use yolov5_6.2 without pre-training weight training for 50 epochs. The data set is the same set of images in RGB and GRAY formats, and the label files are exactly the same. –> Because the data set is small, 2491 (train), 360 (val), and no pre-trained model is used, epoch=50, so P, R, and mAP are relatively low.
3.1 Model testing of RGB image training
Training comparison
The train time of an epoch ranges from 4:24-4:34, and the val time of an epoch ranges from 0:13-0:15.
Comparison of accuracy
Trian ends, all: P(0.869) R(0.653) mAP(0.717)@.5 mAP(0.48)@.5:.95
val: all:P(0.712) R(0.651) mAP(0.71)@.5 mAP(0.512)@.5:.95
Comparison of val inference time
Speed: 0.3ms pre-process, 7.1ms inference, 1.8ms NMS per image at shape (8, 3, 640, 640)
3.2 Model detection test of GRAY graph training
Training comparison
The train time of an epoch ranges from 2:48-2:54, and the val time of an epoch ranges from 0:08-0:09.
Comparison of accuracy
Trian ends, all: P(0.905) R(0.617) mAP(0.705)@.5 mAP(0.464)@.5:.95
val: all:P(0.728) R(0.619) mAP(0.696)@.5 mAP(0.497)@.5:.95
Comparison of val inference time
Speed: 0.1ms pre-process, 4ms inference, 2.3ms NMS per image at shape (8, 1, 640, 640)
During the above process of running val.py,
[Error 6] is as follows
...
File "D:/yolov5train/yolov5_6.2_grayTrain/val.py", line 169, in run
model.warmup(imgsz=(1 if pt else batch_size, 3, imgsz, imgsz)) # warmup
...
File "D:\AppData\anaconda3.8\envs\yolov5t\lib\site-packages\torch\nn\modules\conv.py", line 442, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [32, 1, 6, 6], expected input[1, 3, 640, 640] to have 1 channels, but got 3 channels instead
Solution: Replace 169 lines of code in the val.py source code
model.warmup(imgsz=(1 if pt else batch_size, 3, imgsz, imgsz)) # warmup
Change it to the following, and [Error 6] is resolved.
model.warmup(imgsz=(1 if pt else batch_size, 1, imgsz, imgsz)) # warmup
3.3 Comparison of RGB and GRAY training and testing
This time both RGB and GRAY use yolov5_6.2 without pre-training weight training for 50 epochs. The data set is the same set of images in RGB and GRAY formats, and the label files are exactly the same. Because the data set is allocated: 2491 (train), 360 (val). The comparison results are as follows:
[Table 3-1 Comparison of RGB and GRAY model conclusions]
4 Conclusion
Based on Table 3-1, the following conclusions can be drawn:
(1) By comparing images with RGB and GRAY, the time for training and inference using grayscale images is indeed reduced. The inference time we are concerned about dropped from 7.1 to 4.1, which reduced the time by 3 / 7.1 = 42%.
(2) There are differences in the tested P, R, and [email protected] values, but the differences in 50 epochs are not large. Since there is no pre-trained model, 50 rounds of training are required.
5 Test RGB to Gray conversion time
Pass the 'detect.py' test. Change the parameter ch=3 passed into the model to 1. The '--source' parameter takes the images in the folder that are all RGB. It is found that after direct detection, it will be saved in the 'runs/val' directory in gray format. It turns out that the method 'def next (self) :' in the 'class LoadImages:' class in the 'utils/dataloaders.py' source code has been changed to 'img0 = cv2.imread(path, 0)'. So, I measured the time consumption of the 'cv2.imread()' method, as follows:
读取RGB图的耗时(s): 0.02003192901611328
读取灰度图的耗时(s): 0.011004924774169922
In fact, the time-consuming comparison of the 'cv2.imread()' method is completely unnecessary. We usually obtain the stream through rtsp stream or camera SDK. The camera data stream obtained is in yuv format, and the y channel data is gray. data. Instead, obtaining RGB data requires calculation of the yuv to RGB formula.
It is necessary to study, ① How to read the video in the camera when target detection is deployed. ② What kind of data format is given by the camera SDK. ③ In the pre-processing of target detection, is it processing RGB images, or other formats (or forms of data) streamed from the video? Does it need to be converted to RGB format before other pre-processing operations are performed? Consider the principles that include the camera's encoding and decoding operations.
[The three points mentioned in the previous paragraph that need to be studied, are there any better resources to recommend, thank you~]
The following is the formula for the mutual conversion between the R, G, B components and Y, U, V components of all pixels in the image.
Y + 0.587*G + .0114*B\\ U = -0.147*R -0.289*G + 0.436*B\\ V = 0.615*R -0.515*G -0.100*B \end{cases}⎩
⎨
⎧Y=0.299∗R+0.587∗G+.0114∗BU=−0.147∗R−0.289∗G+0.436∗BV=0.615∗R−0.515∗G−0.100∗B
{ R = Y + 1.14 ∗ V G = Y − 0.39 ∗ U − 0.58 ∗ V B = Y + 2.03 ∗ U \begin{cases} R = Y + 1.14*V\\ G = Y - 0.39*U - 0.58*V\\ B = Y + 2.03*U \end{cases} ⎩ ⎨ ⎧R=Y+1.14∗VG=Y−0.39∗U−0.58∗VB=Y+2.03∗U
Attachment 1: No source code has been modified, grayscale images + yolov6_6.2 are used directly for training
The model parameters are shown in Figure 1
Figure 1 can also be run. According to my understanding, it is also a 3-channel running training, and the three channels are the values of the GRAY channel. According to yuv's understanding, the y channel is considered a grayscale channel. That is equivalent to the gray value corresponding to y for all three channels.
Appendix 2 The data set labels during training are as follows
Table 1 Target label details
ID | label | describe | Remark |
---|---|---|---|
0 | feet | Pedestrians (people riding balance bikes and flatbeds are also included in this category) | |
1 | car | Cars (including SUV, MPV (pickup), VAN (van)) | |
2 | bus | Bus, bus | |
3 | truck | truck, van | |
4 | bike | bike | |
5 | elec | Motorcycle (electric motorcycle) | |
6 | tricycle | Tricycle (electric tricycle, gas tricycle) | |
7 | coni | cone barrel | |
8 | warm | warning post | |
9 | tralight | traffic light | |
10 | speVeh | Emergency or special vehicles (ambulances, fire trucks, engineering vehicles such as cranes, excavators, muck trucks, etc. |