Use Baidu AI open platform to process data sets

I need to process a bunch of face data sets in my hand, so I can’t do it manually, it’s too much trouble.

So use Baidu's AI open platform, the address is as follows:

https://cloud.baidu.com/product/face

It’s very simple to use. You first need to create an application on the platform, and then there will be AK and SK.

 This is used to obtain the token, and with the token, you can directly request it

Get the token code:

Pay attention to fill in your own AK and SK

 # encoding:utf-8
import requests 

# client_id 为官网获取的AK, client_secret 为官网获取的SK
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=XXXXXX&client_secret=XXXXXXX'
response = requests.get(host)
if response:
    print(response.json())

 token is a long string of strings, copy and save

Next, I realize and this requirement, get the gender of the face and whether to wear glasses, and then divide it into four groups, and move the files that need to be moved to these four folders respectively. If you want to achieve your own needs, you can do it by yourself modify

import os
from shutil import move
import time

token="XXXXXXXXXXXXXXXXX"
_url = "https://aip.baidubce.com/rest/2.0/face/v3/detect" + "?access_token=" + token
headers = {'content-type': 'application/json'}
trs='A'
tr_ts=['train','test']
path='F:\\XXX\\'+tr_ts[0]+trs#trainA,trainB,testA,testB
ls=os.listdir(path)
ls.sort()

for i in ls:
    start=time.time()
    imgBase64=base64.b64encode(open(path+'\\'+i,'rb').read())
    data={"image": imgBase64, "image_type": "BASE64","face_field":"age,gender,glasses"}
    res= requests.post(_url,data=data,headers=headers).json()['result']['face_list'][0]
    #print(res)
    if (res['glasses']['type']!='none')&(res['gender']['type']=='male'):
        move(path+'\\'+i,trs+'\\man_glasses\\')
        print(time.time()-start)
    elif  (res['glasses']['type']!='none')&(res['gender']['type']=='female'):
        move(path+'\\'+i,trs+'\\woman_glasses\\')
        print(time.time()-start)
    elif (res['glasses']['type']=='none')&(res['gender']['type']=='male'):
        move(path+'\\'+i,trs+'\\man_noglasses\\')
        print(time.time()-start)
    else:move(path+'\\'+i,trs+'\\woman_noglasses\\');print(time.time()-start)

You can check this link specifically because there are a lot of recognized content. I only set these here. You can modify it in the face_field field. You can see that I only need age, gender, and glasses.

Intercept part of the content of the document, you can see a lot of attributes, you can choose which one you need

However, according to the above code, the speed is very slow, because the free one can be 2 times per second, and the network speed will be limited

So there is a multi-process version:

 # encoding:utf-8
import requests 
import base64
import os
import time
import multiprocessing as mp

class MSP():
    def __init__(self):
        self.h='trainB'
        self.token="XXXXX"
        self._url = "https://aip.baidubce.com/rest/2.0/face/v3/detect" + "?access_token=" + self.token
        self.headers = {'content-type': 'application/json'}
        self.path='F:\\XXX\\'+self.h
        self.ls=os.listdir(self.path)
        self.ls.sort()
        self.lres=[]
        self.manager = mp.Manager
        self.mp_lst = self.manager().list()
    def post_func(self, i):
        self.imgBase64=base64.b64encode(open(self.path+'\\'+i,'rb').read())
        self.data={"image": self.imgBase64, "image_type": "BASE64","face_field":"age,gender,glasses"}
        self.res= requests.post(self._url,data=self.data,headers=self.headers).json()['result']['face_list'][0]
        self.res['name']=i
        self.mp_lst.append(self.res)
        time.sleep(0.1)
        print(i)

    def flow(self):
        pool = mp.Pool(10)
        for i in self.ls:
            pool.apply_async(self.post_func, args=(i,))            
        pool.close()
        pool.join()

if __name__ == '__main__':
    start_time = time.time()
    msp = MSP()
    msp.flow()
    f=open('XXX.txt','w')
    f.write(str(msp.mp_lst))
    f.close()
    print(time.time() - start_time)

It’s about ten times faster than before, really fast

Save the attribute content in a txt, you need to move your own code to read the file content

 # encoding:utf-8
import os
from shutil import move
import time
 
#trainB
path='F:\\XXX\\trainB'
trs='B'
lres_trainB=open('XXXX.txt').read()
for res in eval(lres_trainB):
    start=time.time()
    i=res['name']
    if (res['glasses']['type']!='none')&(res['gender']['type']=='male'):
        move(path+'\\'+i,trs+'\\man_glasses\\')
        print(time.time()-start)
    elif  (res['glasses']['type']!='none')&(res['gender']['type']=='female'):
        move(path+'\\'+i,trs+'\\woman_glasses\\')
        print(time.time()-start)
    elif (res['glasses']['type']=='none')&(res['gender']['type']=='male'):
        move(path+'\\'+i,trs+'\\man_noglasses\\')
        print(time.time()-start)
    else:move(path+'\\'+i,trs+'\\woman_noglasses\\');print(time.time()-start)

At this point, the process is over

Guess you like

Origin blog.csdn.net/zhou_438/article/details/111315977