Crawling thousands of young lady avatars on a dating website

Visit the website of a certain fate of the century, and you can view the avatars of many young ladies (little brothers) by searching. Moreover, these avatars are not restricted and can be viewed, but if you want to see the big picture, you need to register and log in. This article is only interested in avatars, so no registration is required.

Using Python crawler can easily crawl thousands of user avatars.

Not much to say, on the code. Note that headers and cookies can be set according to your browser's access records.

import requests
import re
import urllib

my_url = "http://search.jiayuan.com/v2/search_v2.php"
my_header = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"}
my_cookies = {"guider_quick_search":"on", "accessID":"20181213094914916362", "SESSION_HASH":"b1ef27371485bbf7ea817f39c404c23fd82f3d39", "PHPSESSID":"8f12d200f751a3f108aeb4ed3f50192b", "is_searchv2":"1"}

def get_img_url(img_url):
    return img_url.replace("\\","")

def downImage(url):
    outname = url.split("/")[-1]
    urllib.request.urlretrieve(url,outname)

def main(loc=32, age="18.33"): #设定搜索地点和年龄范围
    page = 1
    my_select = "1:" + str(loc) + ",2:" + str(age) +",23:1"
    my_data = {"sex":"f", "key":"", "stc": my_select, "sn":"default","sv":"1","p":str(page),"f":"search", "listStyle":"bigPhoto","pri_uid":"0", "jsversion":"v5"}
    rr = requests.post(url = my_url, headers = my_header, cookies = my_cookies, params = my_data)
    img_list = re.findall('image":"(http:.*?jpg)',rr.text)
    while all(x.endswith("avatar_p.jpg") for x in img_list):
        for img in img_list:
            i_url = get_img_url(img)
            downImage(i_url)
            print(i_url)
        page = page + 1
        my_data = {"sex":"f", "key":"", "stc": my_select, "sn":"default","sv":"1","p":str(page),"f":"search", "listStyle":"bigPhoto","pri_uid":"0", "jsversion":"v5"}
        rr = requests.post(url = my_url, headers = my_header, cookies = my_cookies, params = my_data)
        img_list = re.findall('image":"(http:.*?jpg)',rr.text)

if __name__ == "__main__":
    main()

(The code block can slide left and right)

Submit the search criteria through the post. The above code constructs the search criteria through my_data. Several key parameters are as follows, namely:
"sex": "f" -gender : female
"stc": "1:32,2:18.30" :-1 means province, 32 means Jiangsu, you can also set it to other provinces; 2 means age range, 18.30 means age range between 18-30 years old.
However, not all avatars can be viewed. After reaching a certain page number, you need to register to view it.
However, there are enough images that can be viewed.
Set the province as 99, which means overseas. In just a short time, 2000 avatars were downloaded. The effect is as follows:

Crawling thousands of young lady avatars on a dating website

==== THE END ====
(for learning only)
Crawling thousands of young lady avatars on a dating website

Guess you like

Origin blog.51cto.com/15069450/2577362