[Source code and deployment tutorial] PCB board defect detection system: improved YOLOv5 & OpenCV

1.Research background

With the popularity of electronic products and the expansion of their application scope, Printed Circuit Board (PCB) is the core component of electronic products, and its quality and reliability play a vital role in the performance and life of the entire product. However, in the PCB manufacturing process, due to the influence of materials, equipment, processes and other factors, some defects often occur, such as poor welding, circuit breaks, short circuits, etc. These defects will not only affect the normal operation of the product, but may also cause damage to the product or even endanger the safety of the user.

Therefore, accurate and efficient detection and identification of PCB board defects is crucial. Traditional PCB board defect detection methods mainly rely on manual visual inspection. This method has problems such as low efficiency, error-prone, and strong subjectivity. With the rapid development of computer vision and deep learning technology, the PCB board defect system based on YOLOv5 and OpenCV emerged as the times require.

YOLOv5 is a target detection algorithm based on deep learning, which has the advantages of high efficiency, accuracy, and strong real-time performance. By training a deep neural network, YOLOv5 can automatically detect and identify various defects on PCB boards. OpenCV is an open source computer vision library that provides rich image processing and analysis functions and can be used to pre-process and post-process PCB board images.

2. Research significance:

  1. Improve detection efficiency and accuracy: Compared with traditional manual visual inspection methods, systems based on YOLOv5 and OpenCV can automatically detect and identify PCB board defects, greatly improving detection efficiency and accuracy. At the same time, because YOLOv5 has strong real-time characteristics, it can monitor the production process of PCB boards in real time, discover and repair defects in time, and improve product quality and reliability.

  2. Reduce costs and improve production efficiency: Traditional manual visual inspection methods require a large amount of manpower and time investment, are high in cost and inefficient. The system based on YOLOv5 and OpenCV can realize automated detection, reduce manpower investment and detection time, reduce costs, and improve production efficiency.

  3. Promote the application of computer vision and deep learning technology: The PCB board defect system based on YOLOv5 and OpenCV is a typical application of computer vision and deep learning technology in the field of electronic manufacturing. Through the research and application of this system, the application of computer vision and deep learning technology in other fields can be promoted, and technological progress and innovation can be promoted.

In short, the PCB board defect system based on YOLOv5 and OpenCV has important research significance and practical application value. Through the research and application of this system, the efficiency and accuracy of PCB board defect detection can be improved, costs can be reduced, production efficiency can be improved, and the application of computer vision and deep learning technology can be promoted. This is of great significance for improving the quality and reliability of electronic products and promoting the development of electronic manufacturing.

3. Picture demonstration

2.png

3.png

4.png

4. Video demonstration

Improving the PCB board defect detection system of YOLOv5 & OpenCV (source code and deployment tutorial)_bilibili_bilibili

5. Core code explanation

5.1 ui.py


def load_model(
        weights='./best.pt',  # model.pt path(s)
        data=ROOT / 'data/coco128.yaml',  # dataset.yaml path
        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
        half=False,  # use FP16 half-precision inference
        dnn=False,  # use OpenCV DNN for ONNX inference

):
    # Load model
    device = select_device(device)
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data)
    stride, names, pt, jit, onnx, engine = model.stride, model.names, model.pt, model.jit, model.onnx, model.engine

    # Half
    half &= (pt or jit or onnx or engine) and device.type != 'cpu'  # FP16 supported on limited backends with CUDA
    if pt or jit:
        model.model.half() if half else model.model.float()
    return model, stride, names, pt, jit, onnx, engine


def run(model, img, stride, pt,
        imgsz=(640, 640),  # inference size (height, width)
        conf_thres=0.15,  # confidence threshold
        iou_thres=0.15,  # NMS IOU threshold
        max_det=1000,  # maximum detections per image
        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
        classes=None,  # filter by class: --class 0, or --class 0 2 3
        agnostic_nms=False,  # class-agnostic NMS
        augment=False,  # augmented inference
        half=False,  # use FP16 half-precision inference
        ):

    cal_detect = []

    device = select_device(device)
    names = model.module.names if hasattr(model, 'module') else model.names  # get class names

    # Set Dataloader
    im = letterbox(img, imgsz, stride, pt)[0]

    # Convert
    im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
    im = np.ascontiguousarray(im)

    im = torch.from_numpy(im).to(device)
    im = im.half() if half else im.float()  # uint8 to fp16/32
    im /= 255  # 0 - 255 to 0.0 - 1.0
    if len(im.shape) == 3:
        im = im[None]  # expand for batch dim

    pred = model(im, augment=augment)

    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
    # Process detections
    for i, det in enumerate(pred):  # detections per image
        if len(det):
            # Rescale boxes from img_size to im0 size
            det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img.shape).round()

            # Write results

            for *xyxy, conf, cls in reversed(det):
                c = int(cls)  # integer class
                label = f'{
      
      names[c]}'
                lbl = names[int(cls)]
                print(lbl)
                #if lbl not in [' Chef clothes',' clothes']:
                    #continue
                cal_detect.append([label, xyxy,str(float(conf))[:5]])
    return cal_detect


This program file is a chip surface defect detection system based on YOLOv5. The main functions include loading the model, running the model for target detection, and displaying the detection results.

The program file first imports a series of libraries and modules, including argparse, platform, shutil, time, numpy, cv2, torch, etc. Then some global variables and functions are defined.

The load_model function is used to load the model. The parameters include model weight path, data set configuration file path, device type, etc. This function returns the loaded model, step size, category name and other information.

The run function is used to run the model for target detection. The parameters include the model, input image, step size, model input size, confidence threshold, etc. This function returns the detected target information.

The det_yolov5v6 function is used to perform target detection on input images or videos, and the parameter is the input path. This function calls the run function for target detection and draws the detection results on the image or video.

The Thread_1 class is a thread class inherited from QThread and is used to run target detection tasks in the background.

The Ui_MainWindow class is a user interface class used to create and manage the graphical interface of the program. This class defines the layout and controls of the interface, and binds the corresponding event handling functions.

The main function part of the program first loads the model, then creates a Qt application and main window, and instantiates the Ui_MainWindow class as a ui object. Finally starts the application's event loop.

The function of the entire program is to select the input file in the graphical interface, and then click the Start Recognition button. The program will call the target detection function to perform target detection on the input file, and display the detection results on the interface.

5.2 models\common.py
class SwinTransformerBlock(nn.Module):
    def __init__(self, c1, c2, num_heads, num_layers, window_size=8):
        super().__init__()
        self.conv = None
        if c1 != c2:
            self.conv = Conv(c1, c2)

        # remove input_resolution
        self.blocks = nn.Sequential(*[SwinTransformerLayer(dim=c2, num_heads=num_heads, window_size=window_size,
                                 shift_size=0 if (i % 2 == 0) else window_size // 2) for i in range(num_layers)])

    def forward(self, x):
        if self.conv is not None:
            x = self.conv(x)
        x = self.blocks(x)
        return x
class WindowAttention(nn.Module):

    def __init__(self, dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0., proj_drop=0.):

        super().__init__()
        self.dim = dim
        self.window_size = window_size  # Wh, Ww
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = qk_scale or head_dim ** -0.5

        # define a parameter table of relative position bias
        self.relative_position_bias_table = nn.Parameter(
            torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads))  # 2*Wh-1 * 2*Ww-1, nH

        # get pair-wise relative position index for each token inside the window
        coords_h = torch.arange(self.window_size[0])
        coords_w = torch.arange(self.window_size[1])
        coords = torch.stack(torch.meshgrid([coords_h, coords_w]))  # 2, Wh, Ww
        coords_flatten = torch.flatten(coords, 1)  # 2, Wh*Ww
        relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :]  # 2, Wh*Ww, Wh*Ww
        relative_coords = relative_coords.permute(1, 2, 0).contiguous()  # Wh*Ww, Wh*Ww, 2
        relative_coords[:, :, 0] += self.window_size[0] - 1  # shift to start from 0
        relative_coords[:, :, 1] += self.window_size[1] - 1
        relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1
        relative_position_index = relative_coords.sum(-1)  # Wh*Ww, Wh*Ww
        self.register_buffer("relative_position_index", relative_position_index)

        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
        self.attn_drop = nn.Dropout(attn_drop)
        self.proj = nn.Linear(dim, dim)
        self.proj_drop = nn.Dropout(proj_drop)

        nn.init.normal_(self.relative_position_bias_table, std=.02)
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x, mask=None):

        B_, N, C = x.shape
        qkv = self.qkv(x).reshape(B_, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
        q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)

        q = q * self.scale
        attn = (q @ k.transpose(-2, -1))

        relative_position_bias = self.relative_position_bias_table[self.relative_position_index.view(-1)].view(
            self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1)  # Wh*Ww,Wh*Ww,nH
        relative_position_bias = relative_position_bias.permute(2, 0, 1).contiguous()  # nH, Wh*Ww, Wh*Ww
        attn = attn + relative_position_bias.unsqueeze(0)

        if mask is not None:
            nW = mask.shape[0]
            attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)
            attn = attn.view(-1, self.num_heads, N, N)
            attn = self.softmax(attn)
        else:
            attn = self.softmax(attn)

        attn = self.attn_drop(attn)

        # print(attn.dtype, v.dtype)
        try:
            x = (attn @ v).transpose(1, 2).reshape(B_, N, C)
        except:
            #print(attn.dtype, v.dtype)
            x = (attn.half() @ v).transpose(1, 2).reshape(B_, N, C)
        x = self.proj(x)
        x = self.proj_drop(x)
        return x

class Mlp(nn.Module):

    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.SiLU, drop=0.):
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features
        self.fc1 = nn.Linear(in_features, hidden_features)
        self.act = act_layer()
        self.fc2 = nn.Linear(hidden_features, out_features)
        self.drop = nn.Dropout(drop)

    def forward(self, x):
        x = self.fc1(x)
        x = self.act(x)
        x = self.drop(x)
        x = self.fc2(x)
        x = self.drop(x)
        return x

class SwinTransformerLayer(nn.Module):

    def __init__(self, dim, num_heads, window_size=8, shift_size=0,
                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., drop_path=0.,
                 act_layer=nn.SiLU, norm_layer=nn.LayerNorm):
        super().__init__()
        self.dim = dim
        self.num_heads = num_heads
        self.window_size = window_size
        self.shift_size = shift_size
        self.mlp_ratio = mlp_ratio
        # if min(self.input_resolution) <= self.window_size:
        #     # if window size is larger than input resolution, we don't partition windows
        #     self.shift_size = 0
        #     self.window_size = min(self.input_resolution)
        assert 0 <= self.shift_size < self.window_size, "shift_size must in 0-window_size"

        self.norm1 = norm_layer(dim)
        self.attn = WindowAttention(
            dim, window_size=(self.window_size, self.window_size), num_heads=num_heads,
            qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)

        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
        self.norm2 = norm_layer(dim)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

    def create_mask(self, H, W):
        # calculate attention mask for SW-MSA
        img_mask = torch.zeros((1, H, W, 1))  # 1 H W 1
        h_slices = (slice(0, -self.window_size),
                    slice(-self.window_size, -self.shift_size),
                    slice(-self.shift_size, None))
        w_slices = (slice(0, -self.window_size),
                    slice(-self.window_size, -self.shift_size),
                    slice(-self.shift_size, None))
        cnt = 0
        for h in h_slices:
            for w in w_slices:
                img_mask[:, h, w, :] = cnt
                cnt += 1

        mask_windows = window_partition(img_mask, self.window_size)  # nW, window_size, window_size, 1
        mask_windows = mask_windows.view(-1, self.window_size * self.window_size)
        attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)
        attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))

        return attn_mask

    def forward(self, x):
        # reshape x[b c h w] to x[b l c]
        _, _, H_, W_ = x.shape

        Padding = False
        if min(H_, W_) < self.window_size or H_ % self.window_size!=0 or W_ % self.window_size!=0:
            Padding = True
            # print(f'img_size {min(H_, W_)} is less than (or not divided by) window_size {self.window_size}, Padding.')
            pad_r = (self.window_size - W_ % self.window_size) % self.window_size
            pad_b = (self.window_size - H_ % self.window_size) % self.window_size
            x = F.pad(x, (0, pad_r, 0, pad_b))

        # print('2', x.shape)
        B, C, H, W = x.shape
        L = H * W
        x = x.permute(0, 2, 3, 1).contiguous().view(B, L, C)  # b, L, c

        # create mask from init to forward
        if self.shift_size > 0:
            attn_mask = self.create_mask(H, W).to(x.device)
        else:
            attn_mask = None

        shortcut = x
        x = self.norm1(x)
        x = x.view(B, H, W, C)

        # cyclic shift
        if self.shift_size > 0:
            shifted_x = torch.roll(x, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2))
        else:
            shifted_x = x

        # partition windows
        x_windows = window_partition(shifted_x, self.window_size)  # nW*B, window_size, window_size, C
        x_windows = x_windows.view(-1, self.window_size * self.window_size, C)  # nW*B, window_size*window_size, C

        # W-MSA/SW-MSA
        attn_windows = self.attn(x_windows, mask=attn_mask)  # nW*B, window_size*window_size, C

        # merge windows
        attn_windows = attn_windows.view(-1, self.window_size, self.window_size, C)
        shifted_x = window_reverse(attn_windows, self.window_size, H, W)  # B H' W' C

        # reverse cyclic shift
        if self.shift_size > 0:
            x = torch.roll(shifted_x, shifts=(self.shift_size, self.shift_size), dims=(1, 2))
        else:
            x = shifted_x
        x = x.view(B, H * W, C)

        # FFN
        x = shortcut + self.drop_path(x)
        x = x + self.drop_path(self.mlp(self.norm2(x)))

        x = x.permute(0, 2, 1).contiguous().view(-1, C, H, W)  # b c h w

        if Padding:
            x = x[:, :, :H_, :W_]  # reverse padding

        return x

class C3STR(C3):
    # C3 module with SwinTransformerBlock()
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        super().__init__(c1, c2, c2, n, shortcut, g, e)
        c_ = int(c2 * e)
        num_heads = c_ // 32
        self.m = SwinTransformerBlock(c_, c_, num_heads, n)

This program file is part of YOLOv5 and contains some commonly used modules and functions. The file defines some convolution and neural network layer classes, such as Conv, DWConv, TransformerLayer, etc. These classes are used to build the network structure of YOLOv5. In addition, some auxiliary functions and tool classes are also defined in the file, such as autopad, check_requirements, increment_path, etc. These functions and classes are used to assist in the training and inference process of the network.

5.3 models\experimental.py

class CrossConv(nn.Module):
    # Cross Convolution Downsample
    def __init__(self, c1, c2, k=3, s=1, g=1, e=1.0, shortcut=False):
        # ch_in, ch_out, kernel, stride, groups, expansion, shortcut
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, (1, k), (1, s))
        self.cv2 = Conv(c_, c2, (k, 1), (s, 1), g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))


class Sum(nn.Module):
    # Weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
    def __init__(self, n, weight=False):  # n: number of inputs
        super().__init__()
        self.weight = weight  # apply weights boolean
        self.iter = range(n - 1)  # iter object
        if weight:
            self.w = nn.Parameter(-torch.arange(1.0, n) / 2, requires_grad=True)  # layer weights

    def forward(self, x):
        y = x[0]  # no weight
        if self.weight:
            w = torch.sigmoid(self.w) * 2
            for i in self.iter:
                y = y + x[i + 1] * w[i]
        else:
            for i in self.iter:
                y = y + x[i + 1]
        return y


class MixConv2d(nn.Module):
    # Mixed Depth-wise Conv https://arxiv.org/abs/1907.09595
    def __init__(self, c1, c2, k=(1, 3), s=1, equal_ch=True):  # ch_in, ch_out, kernel, stride, ch_strategy
        super().__init__()
        n = len(k)  # number of convolutions
        if equal_ch:  # equal c_ per group
            i = torch.linspace(0, n - 1E-6, c2).floor()  # c2 indices
            c_ = [(i == g).sum() for g in range(n)]  # intermediate channels
        else:  # equal weight.numel() per group
            b = [c2] + [0] * n
            a = np.eye(n + 1, n, k=-1)
            a -= np.roll(a, 1, axis=1)
            a *= np.array(k) ** 2
            a[0] = 1
            c_ = np.linalg.lstsq(a, b, rcond=None)[0].round()  # solve for equal weight indices, ax = b

        self.m = nn.ModuleList(
            [nn.Conv2d(c1, int(c_), k, s, k // 2, groups=math.gcd(c1, int(c_)), bias=False) for k, c_ in zip(k, c_)])
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU()

    def forward(self, x):
        return self.act(self.bn(torch.cat([m(x) for m in self.m], 1)))


class Ensemble(nn.ModuleList):
    # Ensemble of models
    def __init__(self):
        super().__init__()

    def forward(self, x, augment=False, profile=False, visualize=False):
        y = []
        for module in self:
            y.append(module(x, augment, profile, visualize)[0])
        # y = torch.stack(y).max(0)[0]  # max ensemble
        # y = torch.stack(y).mean(0)  # mean ensemble
        y = torch.cat(y, 1)  # nms ensemble
        return y, None  # inference, train output


def attempt_load(weights, map_location=None, inplace=True, fuse=True):
    from models.yolo import Detect, Model

    # Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a
    model = Ensemble()
    for w in weights if isinstance(weights, list) else [weights]:
        ckpt = torch.load(attempt_download(w), map_location=map_location)  # load
        if fuse:
            model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())  # FP32 model
        else:
            model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().eval())  # without layer fuse

    # Compatibility updates
    for m in model.modules():
        if type(m) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU, Detect, Model]:
            m.inplace = inplace  # pytorch 1.7.0 compatibility
            if type(m) is Detect:
                if not isinstance(m.anchor_grid, list):  # new Detect Layer compatibility
                    delattr(m, 'anchor_grid')
                    setattr(m, 'anchor_grid', [torch.zeros(1)] * m.nl)
        elif type(m) is Conv:
            m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibility

    if len(model) == 1:
        return model[-1]  # return model
    else:
        print(f'Ensemble created with {
      
      weights}\n')
        for k in ['names']:
            setattr(model, k, getattr(model[-1], k))
        model.stride = model[torch.argmax(torch.tensor([m.stride.max() for m in model])).int()].stride  # max stride
        return model  # return ensemble

This program file is the experimental module of YOLOv5. Several classes and functions are defined in the file, including CrossConv, Sum, MixConv2d and Ensemble.

  • The CrossConv class is a cross-channel convolutional downsampling module that accepts an input feature map x, processes it through two convolutional layers, then adds the result to the input feature map, and finally returns the result.

  • The Sum class is a weighted summation module that accepts multiple input feature maps x, performs a weighted sum on them according to their weights, and finally returns the result.

  • The MixConv2d class is a hybrid depth convolution module that accepts the input feature map x, processes it through multiple convolutional layers, then splices the results, and finally returns the result.

  • The Ensemble class is a collection of models that can use multiple models to perform inference on the input feature map x at the same time, splice their outputs, and finally return the results.

  • The attempt_load function is used to load model weights, and can load a single model or a collection of multiple models. The loaded model can be fused, and you can choose whether to fuse or not. After the loading is completed, some compatibility updates will be performed on the model.

Overall, this program file defines some experimental modules and functions for the implementation and training of the YOLOv5 model.

5.4 models\tf.py
class YOLOv5:
    def __init__(self, weights):
        self.weights = weights
        self.model = self._build_model()

    def _build_model(self):
        # build the YOLOv5 model using the provided weights
        # ...

    def detect(self, image):
        # perform object detection on the input image using the YOLOv5 model
        # ...

    def export(self, export_path):
        # export the YOLOv5 model to the specified export path
        # ...

This class contains methods for building and using the YOLOv5 model. The __init__ method initializes the model and loads the weight file. _build_modelThe method builds the YOLOv5 model based on the weight file. detectThe method is used for target detection on input images. exportThe method is used to export the YOLOv5 model to the specified path.

This is a program file for the YOLOv5 model implemented using TensorFlow and Keras. It contains the definition of various components of the YOLOv5 model, such as convolutional layers, batch normalization layers, activation functions, etc. This file also defines custom layers such as TFBN, TFPad, and TFConv, which are used to implement the same functions in TensorFlow as in PyTorch. In addition, this file also defines the TFDetect class, which is used to implement the detection function of YOLOv5.

The code in this file also contains some example usage of the command line parameters for exporting the model.

In short, this program file implements each component of the YOLOv5 model and provides the function of exporting the model.

5.5 models_init_.py

This program file is called "mode", and its function is to calculate the element that appears most frequently in a set of data. The input of the program is a list containing multiple elements, and the output is the element that appears most frequently.

The main steps of the procedure are as follows:

  1. First, define a function "mode" that accepts a list as a parameter.
  2. Inside the function, create an empty dictionary "count" to record the number of times each element appears.
  3. Use a loop to iterate through each element in the list.
  4. For each element, check whether it is already in the dictionary "count". If it exists, add 1 to the corresponding value; if it does not exist, add the element as a key to the dictionary and set the value to 1.
  5. After the loop ends, traverse the dictionary "count" and find the key with the largest value, which is the element that appears the most.
  6. Return the element with the most occurrences as the output of the function.

This program can help users quickly find the most frequently occurring elements in a set of data, and can be used in statistics, analysis and other application scenarios.

5.6 tools\activations.py
import torch
import torch.nn as nn
import torch.nn.functional as F

class ActivationFunctions:
    class SiLU(nn.Module):
        @staticmethod
        def forward(x):
            return x * torch.sigmoid(x)

    class Hardswish(nn.Module):
        @staticmethod
        def forward(x):
            return x * F.hardtanh(x + 3, 0.0, 6.0) / 6.0

    class Mish(nn.Module):
        @staticmethod
        def forward(x):
            return x * F.softplus(x).tanh()

    class MemoryEfficientMish(nn.Module):
        class F(torch.autograd.Function):
            @staticmethod
            def forward(ctx, x):
                ctx.save_for_backward(x)
                return x.mul(torch.tanh(F.softplus(x)))

            @staticmethod
            def backward(ctx, grad_output):
                x = ctx.saved_tensors[0]
                sx = torch.sigmoid(x)
                fx = F.softplus(x).tanh()
                return grad_output * (fx + x * sx * (1 - fx * fx))

        def forward(self, x):
            return self.F.apply(x)

    class FReLU(nn.Module):
        def __init__(self, c1, k=3):
            super().__init__()
            self.conv = nn.Conv2d(c1, c1, k, 1, 1, groups=c1, bias=False)
            self.bn = nn.BatchNorm2d(c1)

        def forward(self, x):
            return torch.max(x, self.bn(self.conv(x)))

    class AconC(nn.Module):
        def __init__(self, c1):
            super().__init__()
            self.p1 = nn.Parameter(torch.randn(1, c1, 1, 1))
            self.p2 = nn.Parameter(torch.randn(1, c1, 1, 1))
            self.beta = nn.Parameter(torch.ones(1, c1, 1, 1))

        def forward(self, x):
            dpx = (self.p1 - self.p2) * x
            return dpx * torch.sigmoid(self.beta * dpx) + self.p2 * x

    class MetaAconC(nn.Module):
        def __init__(self, c1, k=1, s=1, r=16):
            super().__init__()
            c2 = max(r, c1 // r)
            self.p1 = nn.Parameter(torch.randn(1, c1, 1, 1))
            self.p2 = nn.Parameter(torch.randn(1, c1, 1, 1))
            self.fc1 = nn.Conv2d(c1, c2, k, s, bias=True)
            self.fc2 = nn.Conv2d(c2, c1, k, s, bias=True)

        def forward(self, x):
            y = x.mean(dim=2, keepdims=True).mean(dim=3, keepdims=True)
            beta = torch.sigmoid(self.fc2(self.fc1(y)))
            dpx = (self.p1 - self.p2) * x
            return dpx * torch.sigmoid(beta * dpx) + self.p2 * x

This program file is a module containing different activation functions. Here is a brief description of each activation function:

  1. SiLU: This is an export-friendly version of the SiLU activation function that uses the sigmoid function.
  2. Hardswish: This is an export-friendly version of the Hardswish activation function, which uses the hardtanh function.
  3. Mish: This is an implementation of the Mish activation function, which uses the softplus and tanh functions.
  4. MemoryEfficientMish: This is a memory-efficient implementation of the Mish activation function, which uses a custom torch.autograd.Function.
  5. FReLU: This is an implementation of the FReLU activation function, which uses max pooling and convolution operations.
  6. AconC: This is an implementation of the ACON activation function, which operates on the input according to the parameters p1, p2 and beta.
  7. MetaAconC: This is an implementation of the MetaACON activation function, which uses a small network to generate parameter beta and operates on the input according to parameters p1, p2 and beta.

These activation functions can be used as nonlinear transformations in neural networks to increase the expressive power of the model.

6. Overall structure of the system

Overview of overall functions and architecture:
This project is a chip surface defect detection system based on YOLOv5. It contains multiple modules and tools for building, training and inferring YOLOv5 models, as well as processing data sets, visualizing results, etc. Among them, ui.py is the main program file, used to create a graphical interface and call the target detection function; the models directory contains various components and experimental modules of the YOLOv5 model; the tools directory contains some auxiliary tools and functional modules; the utils directory contains some Common tools and functional modules.

The following is a summary of the functions of each file (only a brief description of the file name and function):

file path Functional Overview
ui.py Create a graphical interface and call the target detection function
models\common.py Define the convolutional and neural network layers of the YOLOv5 model
models\experimental.py Define the experimental module of the YOLOv5 model
models\tf.py YOLOv5 model implemented using TensorFlow and Keras
models\yolo.py Define the main structure and functions of the YOLOv5 model
models_init_.py Initialization file for model module
tools\activations.py Modules that define different activation functions
tools\augmentations.py Define data augmentation modules
tools\autoanchor.py Module for automatically calculating anchor boxes
tools\autobatch.py Automatically adjust batch size module
tools\callbacks.py Define the callback function during training
tools\datasets.py Modules for processing data sets
tools\downloads.py Module to download dataset and model weights
tools\general.py Common tool functions and functions
tools\loss.py Module that defines the loss function
tools\metrics.py Modules that define evaluation metrics
tools\plots.py Modules for plotting graphs and visualizing results
tools\torch_utils.py Tool functions and features related to PyTorch
tools_init_.py Initialization file of tool module
tools\aws\resume.py Model recovery capabilities on the AWS platform
tools\aws_init_.py Initialization files for AWS tool modules
tools\flask_rest_api\example_request.py Sample request for Flask REST API
tools\flask_rest_api\restapi.py Implementation of Flask REST API
tools\loggers_init_.py Initialization file for the logger module
tools\loggers\wandb\log_dataset.py Module that uses WandB to record data set information
tools\loggers\wandb\sweep.py Module for hyperparameter search using WandB
tools\loggers\wandb\wandb_utils.py Using WandB's Helper Functions and Features
tools\loggers\wandb_init_.py Initialization file for WandB logger module
utils\activations.py Modules that define different activation functions
utils\augmentations.py Define data augmentation modules
utils\autoanchor.py Module for automatically calculating anchor boxes
utils\autobatch.py Automatically adjust batch size module
utils\callbacks.py Define the callback function during training
utils\datasets.py Modules for processing data sets
utils\downloads.py Module to download dataset and model weights
utils\general.py Common tool functions and functions
utils\loss.py Module that defines the loss function
utils\metrics.py Modules that define evaluation metrics
utils\plots.py Modules for plotting graphs and visualizing results
utils\torch_utils.py Tool functions and features related to PyTorch
utils_init_.py Initialization file of tool module
utils\aws\resume.py Model recovery capabilities on the AWS platform
utils\aws_init_.py Initialization files for AWS tool modules
utils\flask_rest_api\example_request.py Sample request for Flask REST API
utils\flask_rest_api\restapi.py Implementation of Flask REST API
utils\loggers_init_.py Initialization file for the logger module
utils\loggers\wandb\log_dataset.py Module that uses WandB to record data set information
utils\loggers\wandb\sweep.py Module for hyperparameter search using WandB
utils\loggers\wandb\wandb_utils.py Using WandB's Helper Functions and Features
utils\loggers\wandb_init_.py Initialization file for WandB logger module

The above is a table organized according to a brief description of the file path and function. The specific functions and implementation details of each file may require further reference to the contents of the ***[System Integration]*** section.

7. Basic structure of convolutional neural network

CNN is a network that combines multi-layer artificial neural networks with convolution operations. It has two main characteristics: first, local connections are used between neurons in two adjacent layers; second, neurons between the same layer Shared weight. This structure reduces the number of weights that need to be trained, reduces the complexity of the network structure, and is highly invariant to tilt, scaling, translation, or other forms of deformation. There is a concept of "layer" when introducing the basic components of CNN. Layer is the basic unit of advanced networks in DL. When forming a deep network, except for the input layer and output layer, the middle layers are called hidden layers. Each layer contains a series of transformations, which are usually data regularization layers, convolutional layers, nonlinear excitation layers, and pooling layers. Data regularization is the data preprocessing layer, which mainly normalizes the data to make network training easier to converge. The layers receive weighted inputs, and the connections between neurons are all weighted values, which enter the next layer after nonlinear transformation. Different convolution weights and nonlinear transformations in the layer will affect the final output of the neural network. The following figure shows the basic structure diagram of classification CNN. The network consists of two basic "layers", a fully connected (FC) layer and a classifier. Both basic "layers" consist of data regularization, kernel convolution, nonlinear excitation layers, and pooling layers.

image.png

The convolutional layer is an important component unit of CNN. The convolution operation is different from the full connection between each layer of neurons in the traditional artificial neural network. The local perception method is used between two adjacent neurons in the convolution layer, and the neurons in the same layer share weights (Chang Haitao , 2018). Fully connected means that each neuron in one layer is connected to all neurons in adjacent layers, so that the entire image can be perceived, and the number of parameters will increase exponentially as the image dimension increases. The convolution operation is similar to a biological neural network, and the connections between neurons are local connections. This method reduces the complexity of the network model while reducing the number of parameters, and it also maintains invariance to transformations such as scaling and translation.
As shown in the figure is a schematic diagram of the fully connected and kernel convolution operations, where Figure a is the fully connected operation and Figure b is the kernel convolution operation. As shown in the figure, the local connection between neurons is compared with the traditional full connection. One neuron in layer n is only connected to three neurons in layer n-1 instead of all neurons. Among them, each arrow in Figure a represents a weight value, and the arrows of the same color in Figure B represent the same weight value (Wi, W2, wz), that is, each convolution kernel has different input feature maps. Region reuse achieves weight sharing.
image.png

8.Improvements of YOLOv5

Refer to this blog for the latest research status. With the success of PVT and Swin Transformer, we see the great prospect of applying ViT to the backbone of dense prediction. The core of PVT is a pyramid structure. At the same time, the amount of calculation is further reduced by downsampling the keys and values ​​of attention, but its computational complexity is still proportional to the image size (HW). Swin Transformer proposes window attention based on the pyramid structure, which is essentially a kind of local attention and establishes a cross-window relationship through a shifted window. Its computational complexity is proportional to the image size (HW). The model based on local attention has low computational complexity, but it also loses the global receptive field modeling ability of global attention. Recently, there have been some works based on local attention after Swin Transformer, which improve the global modeling capabilities of the model from different aspects.
The Twins idea proposed by the original author is relatively simple, which is to combine local attention and global attention. The main body of Twins also uses a pyramid structure, but each stage uses LSA (Locally-grouped self-attention) and GSA (Global sub-sampled attention) alternately. The LSA here is actually the windowattention in Swin Transformer, and GSA is the windowattention in PVT. An MSA is used to subsapmle keys and values. LSA is used to extract local features, while GSA is used to achieve the global receptive field:
image.png

It can be seen that these four models and Swin Transformer are essentially a kind of local attention, but they enhance the global modeling ability of local attention in different ways. Moreover, under similar parameters and computational requirements, the five models performed similarly on classification tasks and dense tasks. Recently, Microsoft systematically summarized the three major characteristics of Local Vision Transformer in the paper Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight:
. Sparse connectivity: The output of each token is only It depends on the tokens on the local window where it is located, and there is no connection between the channels; (the linear
projections of the query, key and value in the attention are ignored here, then the attention can actually be regarded as The characteristics of tokens are weighted and summed under the calculated weights, and it is channel-wise)
. Weight sharing: The weight is shared for each channel;
.Dynamic weight: The weight is not fixed, but dynamically generated based on each token.
Then local attention is very similar to Depth-Wise Convolution. First of all, the latter also has Sparseconnectivity: only within the kernel size range, and there is no connection between each channel. Depth-Wise
Convolution also has weight sharing, but the convolution kernel is shared in all spatial positions, but different channels use different convolution kernels. In addition, the convolution kernel of depth-wise convolution is a training parameter, which is fixed once the training is completed, not fixed. In addition, local attention loses position information and requires position encoding, but depth-wiseconvolution does not. The following figure shows the difference between different operations:
image.png

9. System integration

The complete source code & environment deployment video tutorial & data set & custom UI interface shown below
1.png
Reference blog "Improved PCB board defect detection in YOLOv5 & OpenCV System (source code and deployment tutorial)》

Guess you like

Origin blog.csdn.net/cheng2333333/article/details/135019437