老男孩爬虫实战密训课第一季，2018.6，初识爬虫训练-实战2-自动登陆抽屉网

较上次内容增加的内容：自动登陆抽屉网，自动点赞，自动换页

1.自动登陆网站

　　登陆时故意在浏览器输错出现login，查看内容，获取form data

2.cookies的使用

　　本次爬取的网站采用了cookies授权机制，得先访问总网站，分配到未授权的cookies，登陆后带着cookies去授权

代码：

import requests
from bs4 import BeautifulSoup
#1.先访问抽屉，获取cookie（未授权）,点赞前肯定会访问此网站
r1 = requests.get(
    url='https://dig.chouti.com/all/hot/recent/1',
    headers={
        'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36'
    }
)
r1_cookie_dict=r1.cookies.get_dict()

#2.发送用户名和密码认为认证 + cookie（未授权）
response_login = requests.post(
    url='https://dig.chouti.com/login',
    data={
        'phone':'8613026354610',
        'password':'halou445513',
        'oneMonth':'1'
    },
    headers={
        'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36'
    },
    cookies=r1_cookie_dict
)
# 1.获取点赞id
for page_num in range(1,3):
    response_index = requests.get(
        url='https://dig.chouti.com/all/hot/recent/%s'%page_num,
        headers={
            'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36'
        }
    )
    # print(response_index.text)
    soup = BeautifulSoup(response_index.text,"html.parser")
    div = soup.find(attrs={'id':'content-list'})
    items = div.find_all(attrs={'class':'item'})

    for item in items:
        tag = item.find(attrs={'class':'part2'})
        if not tag:
                continue
        nid = tag.get('share-linkid')
        print(nid)
    #点赞
        r1 = requests.post(
            url='https://dig.chouti.com/link/vote?linksId=%s'%nid,
            headers={
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36'
             },
            cookies = r1_cookie_dict
        )
        print(r1.text)