Depth study - data collection process

Depth study of three elements: data, algorithms, calculate force.
Data in depth study has an important position, the quality of the training data set determines the results of the model, due to the importance of visible data preprocessing.
This article briefly explain how to deal with preliminary data, the data classification, tagging.

Example:
There are tens of thousands of photos, each photo file name contains the age, gender and other information, if we now need to identify the age of a training model;
first need to pre-picture, the first step is to photograph different ages screened, then marked the age classification label, then turn tf format (previously article introduced how transfer format), and finally get the dataset take training.

# -*- coding: UTF-8 -*- 
import re
import os
import shutil
from PIL import Image


#打开txt文本
f1=open('F:/TF2/new/txt/1/data-clear-new.txt','r')
txt_path="F:/TF2/new/data-pic/txt/"
f15=open(txt_path+'20.txt','w+')
f20=open(txt_path+'21-25.txt','w+')
f25=open(txt_path+'26-30.txt','w+')
f30=open(txt_path+'31-35.txt','w+')
f35=open(txt_path+'36-40.txt','w+')
f40=open(txt_path+'41-45.txt','w+')
f45=open(txt_path+'45.txt','w+')

#获取年龄和文件名
for line in f1.readlines():
    file_split=line.split() #以空格为分割符,把关键词分割出来

    old_name=file_split[0]   #原文件名
    old_back=os.path.splitext(old_name)[0]  #分离文件名与后缀
    # print(old_back)

    age=int(file_split[1])   #年龄
    # print(age)

    sub=file_split[2]   #性别
    # print(sub)

    folder=file_split[3]   #原文件夹名称
    # print(folder)
    root_path="F:/TF2/new/tf2-data-clear/"
    save_path = "F:/TF2/new/data-pic/"
    if age <= 20:
        print(line)
        f15.write(line)
        shutil.move(root_path + old_name, save_path + "20/" + old_name)

    if age >= 21 and age <= 25:
        f20.write(line)
        shutil.move(root_path + old_name, save_path + "21-25/" + old_name)

    if age >= 26 and age <= 30:
        f25.write(line)
        shutil.move(root_path + old_name, save_path + "26-30/" + old_name)

    if age >= 31 and age <= 35:
        f30.write(line)
        shutil.move(root_path + old_name, save_path + "31-35/" + old_name)

    if age >= 36 and age <= 40:
        f35.write(line)
        shutil.move(root_path + old_name, save_path + "36-40/" + old_name)

    if age >= 41 and age <= 45:
        f40.write(line)
        shutil.move(root_path + old_name, save_path + "41-45/" + old_name)

    if age > 45:
        f45.write(line)
        shutil.move(root_path + old_name, save_path + "45/" + old_name)

f1.close()
f20.close()
f25.close()
f30.close()
f35.close()
f40.close()
f45.close()

These results obtained six data sets of different ages, and each photo has a record txt file name and age, as the label.

Guess you like

Origin blog.csdn.net/gm_Ergou/article/details/92842366