Simple data enhancement and then training to apply it in the field of foreign object detection in the power grid. It seems that you can publish an article if you have your own data set.

RCNN-based foreign object detection for securing power transmission lines (RCNN4SPTL)

Abstract

  • This paper proposes a new deep learning network - RCNN4SPTL (RCNN -based Foreign Object Detection for Securing Power Transmission lines), which is suitable for detecting foreign objects on transmission lines. RCNN4SPTL uses RPN (Region Proposal Network) to generate region proposals with aspect ratios to match the size of foreign objects. RCNN4SPTL uses end-to-end training to improve performance. Experimental results show that compared with the original Faster RCNN, RCNN4SPTL significantly improves the detection speed and recognition accuracy .

  • Paper address: RCNN-based foreign object detection for securing power transmission lines (RCNN4SPTL) - ScienceDirect

Introduction

  • Maintaining the safety of transmission lines is critical. Foreign objects such as kites, balloons, and plastic films hanging on transmission lines can damage the distribution of high-voltage power and pose a threat to pedestrians and vehicles under the transmission lines. Therefore, in order to remove foreign objects in time, detection of foreign objects is crucial.

  • Currently, there are two main methods for detecting foreign objects: manual line inspection and drone inspection . Since transmission lines usually pass through complex geographical environments such as mountains, rivers, highways, and bridges, manual inspections pose great safety risks. Manual inspection also has problems of low efficiency and poor effectiveness. Drone inspection uses cameras to inspect high-voltage transmission lines. Although drone-based detection is not affected by geographical environment, it still requires a lot of manpower to determine whether there are foreign objects on the images and videos returned by the drone .

  • There have been studies on image morphology detection of foreign objects, such as extraction methods of transmission lines in images. The general process of detection based on image morphology is as follows. First, use Gaussian filter, median filter or bilateral filter to denoise; then apply Otsu (maximum inter-class variance) to segment the background and foreground of the image; finally, use Hough transform to extract transmission lines and identify foreign objects. Due to differences in geographical background and the influence of various weather conditions, it is difficult to select appropriate grayscale thresholds for all images.

  • In recent years, deep learning technology has developed rapidly, raising target detection and classification technology to a new level. This neural network has strong adaptability to geometric transformation and lighting. It can automatically generate feature descriptions based on input images. Ren et al. proposed RCNN, which is a pioneer in region proposal-based object detection in deep learning. For algorithms for generating region suggestions, there are: selective search proposed by Li et al. A series of RCNN variants have emerged: SPP Net, Fast RCNN and Faster RCNN. The speed and performance of RCNN are better than other networks. However, at this stage, faster RCNN is used to detect common objects such as pedestrians and fruits, and no one has tried to apply it to detect foreign objects. Since such objects have no fixed shape, it is difficult for Fast RCNN to extract useful features, which increases the difficulty of training and recognition .

  • This paper proposes a new neural network model RCNN4SPTL based on Faster RCNN for foreign object identification on transmission lines. The RCNN4SPTL model can automatically extract various relevant features of foreign objects on transmission lines to detect foreign objects. Compared with other methods, this model greatly reduces human interference and improves work efficiency.

RCNN4SPTL design and implementation

The RCNN4SPTL model

  • The figure below gives an overall view of the RCNN4SPTL model. It consists of three parts. The first part is the shared convolutional neural network part (SPTL-Net) , which extracts image features to generate image feature maps; the second part is the region proposal generation network (RPN) . Its input is an image feature map, and its output is candidate regions of different sizes and proportions. Finally, there is the classification regression network . Its inputs are feature maps and target region proposals. The third part generates a fixed-dimensional feature vector corresponding to the region proposal, and then performs image classification and positioning. Finally, RCNN4SPTL gives the category and location of the target.

    • Insert image description here

    • The RCNN4SPTL model

SPTL-Net

  • RCNN4SPTL adopts SPTL-Net and uses smaller convolution kernels to improve the quality of feature extraction, reduce the number of neurons without affecting detection performance, and improve training and detection speed .

  • SPTL-Net is shown in the figure below. It has eight floors. The first five layers are convolutional layers, and the last three layers are fully connected layers. The first convolutional layer has 96 convolution kernels of size 5 × 5 × 3, filtering the input image by 223 × 223 × 3. The stride of the convolution kernel is two pixels. Smaller convolution kernels are beneficial to feature fusion and fine feature extraction. The second convolutional layer has 256 convolution kernels of size 5 x 5 x 96, which convolves the pooling result of the first layer. The third convolutional layer does the same operation using 384 convolution kernels of size 3 x 3 x 256. The fourth and fifth convolutional layers are connected to each other without pooling layers between them. The first fully connected layer has 4096 neurons. The number of neurons in the second fully connected layer is 1048.

    • Insert image description here

    • SPTL-Net model

  • The convolution and pooling operations are performed using formulas (1) and (2) respectively.

    • o u t p u t s i z e = i m p u t s i z e − k e r n e l S i z e + 2 ∗ p a d d i n g s t r i d e + 1 o u t p u t s i z e = i n p u t s i z e − k e r n e l S i z e s t r i d e + 1 output_{size}=\frac{imput_{size}-kernelSize+2*padding}{stride}+1\\ output_{size}=\frac{input_{size}-kernelSize}{stride}+1 outputsize=strideimp u t _sizekernelSize+2padding+1outputsize=strideinputsizekernelSize+1

    • Where outputsize is the size of the output image, inputsize is the size of the input image, kernel size is the size of the convolution kernel, pad is the size of the padding pixels, and stride is the step size.

Adjust the size and proportion of region proposals

  • RPN is a convolutional neural network that uses feature maps generated by SPTL-Net as input to generate rectangular region proposals of different sizes and aspect ratios. RPN first uses a 3 × 3 sliding window to slide over the feature map; it projects each location on the map across the window onto a 256-dimensional feature vector, and then feeds each vector into the next two in fully connected layers. The fully connected layer with the classification function produces 2 x 9 = 18 scores, two scores for each candidate box, representing the likelihood that the candidate box contains and does not contain a given object . A fully connected layer with a regression function yields 4 x 9 = 36 correction parameters. RPN uses these parameters to correct the region proposals, with four correction parameters corresponding to each candidate region . The anchor point (the center of the current sliding window) is centered on the original image, producing region proposals with three scales and three aspect ratios. RPN utilizes nine candidate rectangular regions to adapt to the target. The three sizes are 12 8 2 , 25 6 2 , 51 2 2 128^2, 256^2, 512^2128225625122 , the aspect ratios are 1:1, 1:2, and 2:1 respectively.

  • RPN generates 4 correction parameters tx, ty, tw, and th for each candidate region, and uses these 4 parameters to correct the region proposals. Formulas (3)~(6) are modified formulas:

    • x = w a t x + x a , ( 3 ) y = h a t y + y a , ( 4 ) w = w a t w , ( 5 ) h = h a t h , ( 6 ) x=w_at_x+x_a,(3)\\ y=h_at_y+y_a,(4)\\ w=w_a^{t_w},(5)\\ h=h_a^{t_h},(6) x=watx+xa,(3)y=haty+ya,(4)w=watw,(5)h=hath,(6)

    • Among them, x and y are the x and y coordinates of the center point, and w and h are the width and height of the corrected candidate area. Xa and ya represent the abscissa and ordinate of the center point of the candidate area, and wa and ha represent the width and height of the candidate area before correction.

  • RCNN4SPTL adjusts the recommended aspect ratio of the area based on the shape characteristics of foreign objects on the transmission line. Therefore, RCNN4SPTL changes the aspect ratio of 1:1, 1:2, and 2:1 to 1:1, 2:1, and 3:1, because in the image, most of the balloons hanging on the transmission line are thin and long. The RPN loss function combines the classification score of the candidate box with correction parameters. Equation (7) defines the loss function.

    • L ( { p i } , { t i } ) = 1 N c l s ∑ i L c l s ( p i , p i ∗ ) + λ 1 N r e g ∑ i p i ∗ L r e g ( t i , t i ∗ ) L(\{p_i\},\{t_i\})=\frac{1}{N_{cls}}\sum_iL_{cls}(p_i,p_i^*) +\lambda\frac1{N_{reg}}\sum_ip^*_iL_{reg}(t_i,t_i^*) L({ pi},{ ti})=Ncls1iLcls(pi,pi)+lNreg1ipiLreg(ti,ti)

    • where I is the recommended sequence number for the region, pi p_ipiis the prediction confidence of the target in the i-th candidate region. pi ∗ = 1 p^∗_i = 1pi=1 indicates that the i-th candidate region contains the object,pi ∗ = 0 p^∗_i = 0pi=0 means that the i-th candidate box does not contain the object. ti t_itiis the prediction correction parameter of the candidate area, ti ∗ t ^*_ itiare the correction parameters of the regional recommendations corresponding to the real region. N cls N_{cls}Ncls N r e g N_{reg} NregThe two subterms in equation (7) are normalized. Used to control the relative importance of two sub-items. Lcls() is the loss function of prediction confidence, as shown in the following formula:

    • L c l s ( p i , p i ∗ ) = − l o g ( p i p i ∗ ) L_{cls}(p_i,p^*_i)=-log(p_ip^*_i) Lcls(pi,pi)=log(pipi)

    • Lreg() is the loss function of the modified parameters:

    • L r e g ( t i , t i ∗ ) = ∑ i ∈ { x , y , w , h } s m o o t h L 1 ( t i − t i ∗ ) L_{reg}(t_i,t^*_i)=\sum_{i\in\{x,y,w,h\}}smooth_{L_1}(t_i-t^*_i) Lreg(ti,ti)=i{ x,y,w,h}smoothL1(titi)

    • Among them, smoothL1() is as follows:

    • s m o o t h L 1 ( x ) = 0.5 x 2 , ∣ x ∣ ≤ 1 ;    ∣ x ∣ − 0.5 , ∣ x ∣ > 1 smooth_{L_1}(x)=0.5x^2,|x|\leq1;~~|x|-0.5,|x|>1 smoothL1(x)=0.5x _2,x1;  x0.5,x>1

    • 计算tx ∗ , ty ∗ , tw ∗ t^∗_x, t^∗_y, t^∗_wtx,ty,twand th ∗ t^∗_hthThe formulas are:

    • t x ∗ = x 8 − x a w a t y ∗ = y ∗ − y a h a t w ∗ = l o g ( w 8 w a ) t h ∗ = l o g ( h ∗ h a ) t^*_x=\frac{x^8-x_a}{w_a}\\ t^*_y=\frac{y^*-y_a}{h_a}\\ t^*_w=log(\frac{w^8}{w_a})\\ t^*_h=log(\frac{h^*}{h_a}) tx=wax8xaty=hayyatw=log(waw8)th=log(hah)

    • Among them, x* and y* represent the abscissa and ordinate of the center point of the real area, and w* and h* represent the width and height of the real area. xa , ya , wa , ha x_a, y_a, w_a, h_axa,ya,wa,haRespectively represent the coordinates corresponding to the candidate area.

End-to-end joint training

  • Fast RCNN uses alternating training. First, the model is pre-trained on ImageNet, the shared convolutional network is initialized, and then the RPN is trained. Next, the shared convolutional network is initialized using the pretrained model on ImageNet and the classification regression network is trained. Then determine some parameters of the trained shared convolution network and classification regression network, and start training the RPN network. Finally, Faster RCNN initializes the entire network using the parameters trained in the previous step, leaving the shared convolutional network and RPNs parameters unchanged, and trains the classification regression network .

  • We can see that alternating training means that feature sharing is actually a pseudo-sharing, which reduces the performance of the network . Therefore, RCNN4SPTL uses end-to-end joint training to train the RPN and classification regression network as a whole at the same time.

  • First, the ImageNet pre-trained model is used to initialize the first two fully connected layers of the classification regression network and the shared convolutional neural network; RCNN4SPTL uses a Gaussian distribution with a mean of 0 and a standard deviation of 0.01 to randomly initialize other layers and perform end-to-end fine-tuning. In this training, RPN and the classification regression network jointly train the shared convolutional neural network, allowing RCNN4SPTL to learn the required features at the same time . This kind of training can improve performance and result in better models.

Image Preprocessing

  • The size of the training set affects the performance of the model. The larger the training set, the better the detection effect of the deep learning model. Therefore we need to increase the size of training samples. RCNN4SPTL adopts image preprocessing steps: image flipping, scaling and rotation to expand the training set . This study used left-right flipping; all images were scaled to 400*400 pixels. RCNN4SPTL rotates the image counterclockwise by 20 degrees, 100 degrees and 220 degrees respectively, making RCNN4SPTL invariant. The image below shows some examples of preprocessed images. The picture below (a) is the original image, and the picture below (b) is the pre-processed image. Image flipping, 20-degree rotation, and scaling operations are performed respectively.

    • Insert image description here

    • Preprocess images

Evaluation

  • To evaluate the effectiveness of our approach, we use the following hardware for model training: NVIDIA GeForce GTX 1080TI with Intel i7 @2.40GHz x 6 (6 cores) and 16GB RAM.

Dataset

  • There are 5000 training sample images in this experiment. Among them, there are 2,000 movies, 1,000 films and 2,000 kites. The test dataset has 500 images, including 200 films, 100 balloons and 200 kites. An example data set is shown below. The training set is manually labeled and processed. We fine-tune the hyperparameters of RCNN4SPTL, and then input the training set into the network for a limited number of iterative trainings. Finally, we utilize the test set to test the performance of the trained model and present the results in the next section.

    • Insert image description here

    • Dataset example

Experimental results and analysis

  • The table below shows the precision and recall of the test results. Experimental results show that RCNN4SPTL has better detection performance in terms of detection speed, accuracy and recall rate.

    • Insert image description here

    • Performance comparison

  • In the case of detecting foreign objects on the transmission line, RCNN4SPTL is more suitable for detecting foreign objects than the original Faster RCNN. The figure below shows the results of RCNN4SPTL and Faster RCNN in detecting balloons, kites and movies. The test pictures are all from real scenes (doubtful).

  • The figure (a) below lists the detection results using RCNN4SPTL, and the figure (b) below shows the detection results using Faster RCNN. The results show that RCNN4SPTL recognizes foreign objects with high confidence.

    • Insert image description here

    • Object detection results of RCNN4SPTL and Faster RCNN

Conclusion

  • Timely detection and removal of foreign objects on transmission lines is of great significance. In this study, we first expand the dataset using specific image enhancement techniques: image flipping, scaling and rotation . Then, based on the shape characteristics of foreign objects in transmission lines, the RCNN4SPTL network is proposed, which optimizes the shared convolutional network and region size ratio recommendations. Finally, we train RCNN4SPTL using end-to-end joint training with 20,000 iterations. Experimental results show that RCNN4SPTL is more suitable for accurate identification of foreign objects on transmission lines than the traditional Faster RCNN. The cnn4sptl has faster detection speed and better recognition performance.

Guess you like

Origin blog.csdn.net/weixin_43424450/article/details/132417226