Development and construction of Chinese character detection and recognition analysis system based on lightweight YOLOv5

Chinese character detection, letter detection, handwritten digit detection, Tibetan detection, and oracle bone inscription detection have all been done in my previous articles. Today, it is mainly because of the needs of actual projects. The previous Chinese character detection model is relatively old and used in the yolov3 period The model, detection accuracy and reasoning speed have a lot of lag. Here we need to develop and build a new version of the target detection model based on the yolov5 lightweight model. First, look at the renderings:

Next, simply look at the data set:

The screenshot of the YOLO format annotation file is as follows:

The content of the example annotation is as follows:

17 0.245192 0.617788 0.038462 0.038462
6 0.102163 0.830529 0.045673 0.045673
16 0.894231 0.096154 0.134615 0.134615
4 0.456731 0.524038 0.134615 0.134615
15 0.367788 0.317308 0.269231 0.269231

A screenshot of the VOC format data annotation file is as follows:

The content of the example annotation is as follows:

<annotation>
    <folder>DATASET</folder>
    <filename>0ace8eaf-8e86-488b-9229-95255c69158c.jpg</filename>
    <source>
        <database>The DATASET Database</database>
        <annotation>DATASET</annotation>
        <image>DATASET</image>
    </source>
    <owner>
        <name>YMGZS</name>
    </owner>    
    <size>
        <width>416</width>
        <height>416</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    
    <object>        
        <name>17</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>214</xmin>
            <ymin>302</ymin>
            <xmax>230</xmax>
            <ymax>318</ymax>
        </bndbox>
    </object>
    
    <object>        
        <name>16</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>210</xmin>
            <ymin>67</ymin>
            <xmax>229</xmax>
            <ymax>86</ymax>
        </bndbox>
    </object>
    
    <object>        
        <name>18</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>260</xmin>
            <ymin>7</ymin>
            <xmax>274</xmax>
            <ymax>21</ymax>
        </bndbox>
    </object>
    
    <object>        
        <name>10</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>121</xmin>
            <ymin>103</ymin>
            <xmax>143</xmax>
            <ymax>125</ymax>
        </bndbox>
    </object>
    
    <object>        
        <name>11</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>296</xmin>
            <ymin>289</ymin>
            <xmax>352</xmax>
            <ymax>345</ymax>
        </bndbox>
    </object>
    
    <object>        
        <name>0</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>56</xmin>
            <ymin>132</ymin>
            <xmax>196</xmax>
            <ymax>272</ymax>
        </bndbox>
    </object>
    
    <object>        
        <name>0</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>213</xmin>
            <ymin>142</ymin>
            <xmax>353</xmax>
            <ymax>282</ymax>
        </bndbox>
    </object>
    
</annotation>

Because it is the main lightweight network, the most lightweight n-series model is selected here. The final model file size is less than 4MB. The network structure diagram is as follows:

The default calculation is 100 epochs, and the result directory is as follows:

【Confusion Matrix】

【F1 Value Curve】

【PR curve】

【Training log visualization】

[batch calculation example]

Examples of visual interface reasoning are as follows:

Judging from the results of the evaluation indicators, the detection effect is still very good.

Guess you like

Origin blog.csdn.net/Together_CZ/article/details/129401399