Python crawler-see who is the most "top" in Huya female anchor

Long press like, wait for you to do it! ! !

reptile

Web page link: https://www.huya.com/g/4079
The main steps here are actually the same as we analyzed before, as shown in the following figure:
Insert picture description here
Here we can just take a look at it briefly, the focus is on our second part .
Insert picture description here
Now that we have finished analyzing the web page structure, I chose to use the previous xpath to crawl the resources we need.

# 获取所有的主播信息
def getDatas(html):
    datalist=[]
    parse=parsel.Selector(html)
    lis=parse.xpath('//li[@class="game-live-item"]').getall()
    # print(lis)
    for li in lis:
        data = []
        parse1=parsel.Selector(li)
        img_src=parse1.xpath('//img[@class="pic"]/@data-original').get("data")
        data.append(img_src)
        title=parse1.xpath('//i[@class="nick"]/@title').get("data")
        data.append(title)
        redu=parse1.xpath('//i[@class="js-num"]/text()').get("data")
        data.append(redu)
        datalist.append(data)
    return datalist

In this way, we can get all the resources we need, and then save the picture. There are two ways to download files, one is to open the file with open , and the other is to use urllib.request.urlretrieve(data, path) . The Internet says that the download speed of the second method will be relatively a little faster and a little second set meaning the collection, can be automatically de-emphasis of the operation, no download folder the file is downloaded, otherwise skip.

#保存主播头像
def download(datalist):
    for data in datalist:
        #第一种下载方式
        with open("D:/software/python/python爬虫/虎牙颜值主播排名/", 'wb') as f:
            f.write(data[0])
        #第二种下载方式
        urllib.request.urlretrieve(data[0],"D:/software/python/python爬虫/虎牙颜值主播排名"+"/"+data[1]+".jpg")
        print(data[1]+"下载完成")

Baidu face recognition interface

Baidu AI open platform link: https://ai.baidu.com/
Insert picture description here
Enter the corresponding application name and introduction. Insert picture description here
In this way, our application is created. The selected part is also what we will use next.
Insert picture description here
After that, let's take a look at the sdk file and
Insert picture description here
see the instructions for use. Don't worry about downloading it. Then we can install the module directly in pycharm.
Insert picture description here
After that, let’s take a look at the simple operation process. First create the client:
Insert picture description here
Then we call the interface to parse the picture, because we need to return the parameter of the face value score, so we need to request with the parameter, otherwise the score information will not be returned to us . As shown below:
Insert picture description here

Insert picture description here
In this way, the interface process of our face value detection is basically clear, the code is as follows:

def face_rg(file_path):
    """ 你的 APPID AK SK """
    APP_ID = '你的 App ID'
    API_KEY = '你的 Api Key'
    SECRET_KEY = '你的 Secret Key'

    client = AipFace(APP_ID, API_KEY, SECRET_KEY)

    with open(file_path,'rb')as file:
        data=base64.b64encode(file.read())


    image=data.decode()

    imageType = "BASE64"

    """ 如果有可选参数 """
    options = {
    
    }
    options["face_field"] = "beauty"


    """ 带参数调用人脸检测 """
    result=client.detect(image, imageType, options)
    # print(result)
    return  result['result']['face_list'][0]['beauty']

After that, we only need to write a traversal of the pictures under the folder for detection, and then arrange the entire information in descending order according to the appearance score:

path=r"D:\software\python\python爬虫\虎牙颜值主播排名"
image_list=os.listdir(path)
name_score={
    
    }
for image in image_list:
    try:
        print(image.split(".")[0]+"颜值评分为:%d"%face_rg(path+"/"+image))
        name_score[image.split(".")[0]]=face_rg(path+"/"+image)
    except:
        pass
second_score=sorted(name_score.items(),key=lambda x:x[1],reverse=True)
print("-------------------------------------检测结束-------------------------------------")
print("-------------------------------------以下是排名-------------------------------------")
for a,b in enumerate(second_score):
    print("{}的颜值评分为:{},排名第{}".format(second_score[a][0],second_score[a][1],a+1))

Here, the blogger's appearance score is 52 points after testing, and he didn't even reach the passing line . You can also talk about your score in the comment area.
Insert picture description here

Effect demonstration

Insert picture description here
Insert picture description here
I have seen it here. If you think it is helpful to you, you can follow the blogger's official account. The newcomer up needs your support.
If you have any questions or want the source code, you can chat with the blogger in private.
Insert picture description here
After paying attention to the official account, reply to the Douyu anchor to get the source code:
Insert picture description here

My blog will be synced to Tencent Cloud + community soon, and everyone is invited to join: https://cloud.tencent.com/developer/support-plan?invite_code=1ajy6jg72g6nq

Guess you like

Origin blog.csdn.net/lovely__RR/article/details/108179028