爬虫学习Day1:学习get与post请求

任务

【Task1 学习get与post请求】:(1天)

1.学习get与post请求,尝试使用requests或者是urllib用get方法向https://www.baidu.com/发出一个请求,并将其返回结果输出。

2.如果是断开了网络,再发出申请,结果又是什么。了解申请返回的状态码。

3.了解什么是请求头,如何添加请求头。

Task 1

用get方法向https://www.baidu.com/发出一个请求,并将其返回结果输出

import requests
r = requests.get('https://www.baidu.com')
print(f'status code: {r.status_code}')
print(f'encoding: {r.encoding}')
print(f'headers: {r.headers}')
print(f'text: {r.text[:1000]}')
# 返回结果 :
status code: 200
encoding: ISO-8859-1
headers: {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'Keep-Alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Thu, 28 Feb 2019 14:59:00 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:24:57 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}
text: <!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>ç™¾åº¦ä¸€ä¸‹ï¼Œä½ å°±çŸ¥é“</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><s

Task2

如果是断开了网络,再发出申请,结果又是什么。了解申请返回的状态码
最常见的是404,但直接把网线掉了,无法得到我们想要的结果:

1:把无线断开,电脑无法上网

# 由于不联网了,得到的是错误信息,而不是状态码
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/util/connection.py", line 57, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

2:连接404测试网站

这样才能得到我们想要的状态码:

import requests
r = requests.get('https://httpbin.org/status/404')
print(f'status code: {r.status_code}')
print(f'encoding: {r.encoding}')
print(f'headers: {r.headers}')
print(f'text: {r.text[:1000]}')

status code: 404
encoding: utf-8
headers: {'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Thu, 28 Feb 2019 15:12:59 GMT', 'Server': 'nginx', 'Content-Length': '0', 'Connection': 'keep-alive'}
text:

状态码说明:

http://www.runoob.com/http/http-status-codes.html

Task3

了解什么是请求头,如何添加请求头:
用Chrome查看发出的请求头:
在这里插入图片描述
可以通过如下方式添加请求头:

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
headers = {'User-Agent':user_agent}

比如我把Chrome里的User-agent复制到代码中,再进行request,就模拟Chrome浏览器去访问了:

import requests
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
headers = {'User-Agent':user_agent}
r = requests.get('https://www.baidu.com', headers = headers)

print(f'status code: {r.status_code}')
print(f'encoding: {r.encoding}')
print(f'headers: {r.headers}')
print(f'text: {r.text[:1000]}')

猜你喜欢

转载自blog.csdn.net/weixin_43720396/article/details/88047396