自动登陆抽屉并点赞多页(3)

先获取整个页面

import requests

response_index = requests.get(
    url='https://dig.chouti.com/',
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    }
)

print(response_index.text)

print输出效果如下:

 初步分析抽屉热搜标题页面,可以看出所有标题位于id为content-list的div下面

 我们先解析出所有li标签的位置

soup = BeautifulSoup(response_index.text, 'html.parser')
div = soup.find(attrs={'id': 'content-list'})

然后再找出所有的li标签

items = div.find_all(attrs={'class': 'item'})

再分析标题所在的位置

打印出每个标题的id

for item in items:
tag = (item.find(attrs={'class': 'part2'}))
nid = tag.get('share-linkid')
print(nid)

此时,print出所有标题的id

然后对上一篇文章的单个点赞进行for循环就可以了,完整代码如下:

import requests
from bs4 import BeautifulSoup

# 先访问抽屉最热帮,获取cookie(未授权的)
r1 = requests.get(
    url='https://dig.chouti.com/',
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    }
)
r1_cookie_dict = r1.cookies.get_dict()

# 发送用户名和密码认证 + cookie(未授权)
# 注意:防爬虫策略
response_login = requests.post(
    url='https://dig.chouti.com/login',
    data={
        'phone': '8615921302790',
        'password': 'a123456!',
        'oneMonth': '1'
    },
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    },
    cookies=r1_cookie_dict
)


response_index = requests.get(
    url='https://dig.chouti.com/',
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    }
)


soup = BeautifulSoup(response_index.text, 'html.parser')
div = soup.find(attrs={'id': 'content-list'})
items = div.find_all(attrs={'class': 'item'})
for item in items:
    tag = (item.find(attrs={'class': 'part2'}))
    nid = tag.get('share-linkid')
    print(nid)


    # 根据每个新闻id进行点赞
    r1 = requests.post(
        url='https://dig.chouti.com/link/vote?linksId=%s' % nid,
        headers={
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
        },
        cookies=r1_cookie_dict
    )
    print(r1.text)  

 登录上抽屉,查看页面,可以发现已经自动完成单页点赞了

咱们再来看下翻页。

 

发现了么?一般网站都有这种显眼的规律,而我们再次返回第一页时,发现网址变为https://dig.chouti.com/all/hot/recent/1,所以我们请求主页可以改为它

for page_num in range(1,3):     # 对第1到第3页进行点赞
    response_index = requests.get(
        url='https://dig.chouti.com/all/hot/recent/%s' % page_num,
        headers={
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
        }
    )


    soup = BeautifulSoup(response_index.text, 'html.parser')
    div = soup.find(attrs={'id': 'content-list'})
    items = div.find_all(attrs={'class': 'item'})
    for item in items:
        tag = (item.find(attrs={'class': 'part2'}))
        nid = tag.get('share-linkid')
        print(nid)


        # 根据每个新闻id进行点赞
        r1 = requests.post(
            url='https://dig.chouti.com/link/vote?linksId=%s' % nid,
            headers={
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
            },
            cookies=r1_cookie_dict
        )
        print(r1.text)  

效果如下

这个代码有很多可以改善的地方,这里不多讲述。

猜你喜欢

转载自www.cnblogs.com/Black-rainbow/p/9216164.html
今日推荐