python3 simple configuration environment reptiles
1.pycharm reptile environment configuration
I. First, you need to open pycharm.
Second, introducing the necessary modules
1. Open pycharm, find the file, and then click Settings
2. Project Interpreter and find open, click on the right +
3. Import module pip, pip search and import
In the same way each search requests and lxml module and import
it choose to import version Note lxml module to 3.7.2
because lxml no etree in version 3.7.2 after, we xpath to get the data is influential
here on environment configured success! !
2. small reptiles - crawling cat's eye movie data
Source Codes (copy can be used directly)
The visiting cat's eye is a former movie list 10
URL: https://maoyan.com/board
can enter the site to verify the data about crawling right
import requests
from lxml import etree
# 得到一个网页数据
def getonepage():
# 网址
url = 'https://maoyan.com/board'
# 模拟浏览器
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
# 访问网站获取网站数据
r = requests.get(url, headers=header)
# 返回网站数据文本
return r.text
# 处理并输出网页数据
def parse(text):
# 处理网站数据文本
html = etree.HTML(text)
# 获取指定位置网站数据
names = html.xpath('//div[@class="movie-item-info"]/p[@class="name"]/a/@title') # 获取电影名
releasetimes = html.xpath('//div[@class="movie-item-info"]/p[@class="releasetime"]/text()') # 获取电影上映时间
# 将电影名和上映时间绑定在一起输出
for name, releasetime in zip(names, releasetimes):
print(name, releasetime)
# 将获取的数据赋值给text
text = getonepage()
# 处理并输出网页数据
parse(text)
Crawling results:
3. Conclusion
Here you have mastered a small reptile
is actually a powerful crawler project is the need to configure many environments (only a small part here)
editor I am a white, writing this article
first, to forget can come back later look at
the second is trying to share their learning experience to other white
hope you read this article can help you, big brother do not spray
me, still on the road