1, web crawler, requests the request

requests the request, the request is an analog module with the python browser requests, the source returned html

Two analog browser requests a user login or authentication is not required to request a login or a request requires user authentication

First, the user does not need to login or authentication request

This is relatively simple, the module requests directly send a request to get a source html

# ! / Usr / bin / env Python 
# - * - Coding: utf8 - * - 
Import Requests      # Import analog module browser requests 

HTTP = requests.get (url = " http://www.iqiyi.com/ " )      # sending a http request 
http.encoding = " UTF-. 8 "                              # http request encoding 
neir = http.text                                     # Get the code string http 
Print (neir)
<! DOCTYPE HTML> 
<HTML> 
<head> 
<title> drawer new Hot List - Daily polymerization popular, funny, interesting information </ title> 
        <Meta charset = " UTF-8 " /> 
        <Meta name = " keywords " Content = " drawer new hot list, information, scripts, pictures, public should not, technology, news, festival parade, funny " /> 

        <Meta name = " the Description " Content = "
             drawer new hot list, brought together daily funny piece, popular pictures, interesting news. it Weibo, portals, communities, bbs, social networking sites and other massive content syndication together, through user generated recommendation list the hottest look drawer new hot list daily popular, interesting information panoramic view.
             " />
 
        <Meta name = " a robots " content="index,follow" />
        <meta name="GOOGLEBOT" content="index,follow" />
        <meta name="Author" content="搞笑" />
        <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE8">
        <link type="image/x-icon" href="/images/chouti.ico" rel="icon"/>
        <link type="image/x-icon" href="/images/chouti.ico" rel="Shortcut Icon"/>
        <link type="image/x-icon" href="/images/chouti.ico" rel="bookmark"/>
    <link type="application/opensearchdescription+xml"

Second, the user login or authentication request

When acquiring this page, we must first understand the entire login process, usually the login process, when the user first visits, cookie will automatically generate a file in the browser, when the user enters login information will carry generated cookie file, If the login information is correct will give this cookie

After an authorization, which need to log in to access the page after post-authorization cookie can carry

1, the first visit at home, and then see if there are automatically generated cookie

# ! / Usr / bin / env Python 
# - * - Coding: utf8 - * - 
Import Requests      # Import simulate browser requests a module 

# ## 1, before the visit did not log on at home, get the cookie 
I1 = requests.get ( 
    url = " http://dig.chouti.com/ " , 
    headers = { ' the Referer ' : ' http://dig.chouti.com/ ' } 
) 
i1.encoding = " UTF-. 8 "                                # HTTP request encoding 
i1_cookie = i1.cookies.get_dict ()
 Print (i1_cookie)                                     #Returns the acquired Cookie 
# Returns: { 'JSESSIONID': 'aaaTztKP -KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

Can be seen to generate a cookie, the cookie can be explained if the login information is correct, the background will give authorization cookie here, after the login page access requires authorization to carry

2, to let the program automatically login authorization cookie

First, we use the browser to access the login page, enter lightly about the password and account, obtain the login page url, and login required field

 

 

 

 

Carrying cookie Registration Authority

 

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #导入模拟浏览器请求模块

### 1、在没登录之前访问一下首页,获取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer':'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http请求编码
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #返回获取到的cookie
#返回:{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

### 2、用户登陆,携带上一次的cookie,后台对cookie中的随机字符进行授权
i2 = requests.post(
    url="http://dig.chouti.com/login",              #登录url
    data={                                          #登录字段
        'phone': "8615284816568",
        'password': "279819",
        'oneMonth': ""
    },
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=i1_cookie                               #携带cookie
)
i2.encoding = "utf-8"
dluxxi = i2.text
print(dluxxi)                                       #查看登录后服务器的响应
#返回:{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_50072007463"}}}  登录成功

3、登录成功后,说明后台已经给cookie授权,这样我们访问需要登录的页面时,携带这个cookie即可,比如获取个人中心

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #导入模拟浏览器请求模块

### 1、在没登录之前访问一下首页,获取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer':'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http请求编码
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #返回获取到的cookie
#返回:{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

### 2、用户登陆,携带上一次的cookie,后台对cookie中的随机字符进行授权
i2 = requests.post(
    url="http://dig.chouti.com/login",              #登录url
    data={                                          #登录字段
        'phone': "8615284816568",
        'password': "279819",
        'oneMonth': ""
    },
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=i1_cookie                               #携带cookie
)
i2.encoding = "utf-8"
dluxxi = i2.text
print(dluxxi)                                       #查看登录后服务器的响应
#返回:{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_50072007463"}}}  登录成功

### 3、访问需要登录才能查看的页面,携带着授权后的cookie访问
shouquan_cookie = i1_cookie
i3 = requests.get(
    url="http://dig.chouti.com/user/link/saved/1",
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=shouquan_cookie                        #携带着授权后的cookie访问
)
i3.encoding = "utf-8"
print(i3.text)                                     #查看需要登录才能查看的页面

 

 

获取需要登录页面的html源码成功

全部代码

get()方法,发送get请求
encoding属性,设置请求编码
cookies.get_dict()获取cookies
post()发送post请求
text获取服务器响应信息

搜索887934385交流群免费获取全部源码

Guess you like

Origin www.cnblogs.com/pypypy/p/12003843.html