python3 simple reptile reptilian environment configuration + source code (so that white can play good reptile)

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/gudada010/article/details/95508630

1.pycharm reptile environment configuration

I. First, you need to open pycharm.
Here Insert Picture Description
Second, introducing the necessary modules

1. Open pycharm, find the file, and then click Settings
Here Insert Picture Description
2. Project Interpreter and find open, click on the right +

Here Insert Picture Description
3. Import module pip, pip search and import

Here Insert Picture Description
In the same way each search requests and lxml module and import
it choose to import version Note lxml module to 3.7.2
because lxml no etree in version 3.7.2 after, we xpath to get the data is influential Here Insert Picture Description
here on environment configured success! !

2. small reptiles - crawling cat's eye movie data

Source Codes (copy can be used directly)

The visiting cat's eye is a former movie list 10
URL: https://maoyan.com/board
can enter the site to verify the data about crawling right

import requests
from lxml import etree

# 得到一个网页数据
def getonepage():

    # 网址
    url = 'https://maoyan.com/board'

    # 模拟浏览器
    header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}

    # 访问网站获取网站数据
    r = requests.get(url, headers=header)

    # 返回网站数据文本
    return r.text


# 处理并输出网页数据
def parse(text):

    # 处理网站数据文本
    html = etree.HTML(text)

    # 获取指定位置网站数据
    names = html.xpath('//div[@class="movie-item-info"]/p[@class="name"]/a/@title')  # 获取电影名

    releasetimes = html.xpath('//div[@class="movie-item-info"]/p[@class="releasetime"]/text()')  # 获取电影上映时间

    # 将电影名和上映时间绑定在一起输出
    for name, releasetime in zip(names, releasetimes):
        print(name, releasetime)


# 将获取的数据赋值给text
text = getonepage()


# 处理并输出网页数据
parse(text)

Crawling results:
Here Insert Picture Description

3. Conclusion

Here you have mastered a small reptile
is actually a powerful crawler project is the need to configure many environments (only a small part here)
editor I am a white, writing this article
first, to forget can come back later look at
the second is trying to share their learning experience to other white
hope you read this article can help you, big brother do not spray
me, still on the road

Guess you like

Origin blog.csdn.net/gudada010/article/details/95508630