Python simulates logging in to an SDN and parses the returned data

#!/usr/bin/python
# -*- coding: UTF-8 -*- 

# CSDN 一口仨馍 Python3.5
from urllib import request, parse
from http import cookiejar
from bs4 import BeautifulSoup
import re

CODE_MODE = 'UTF-8'
# 某SDN登陆页
LOGIN_URL = 'https://passport.*sdn.net'
# 某SDN个人主页
PERSON_HOME_URL = 'http://my.*sdn.net/my/my*sdn'
LOGIN_FILE_NAME = 'login.html'
PERSON_HOME_FILE_NAME = 'personHome.html'


def writeFile(contentStr, fileName):
    with open(fileName, 'w', encoding=CODE_MODE) as file:
        file.write(contentStr)


def getValueByName(name):
    return re.compile(r'name="'+name+'" value="(.*?)"').search(loginResponseStr).group(1)


def openWeb(opener, url, postData={}):
    # 将postData转成bytes形式
    post_data = parse.urlencode(postData).encode(encoding=CODE_MODE)
    # 打开url,并读取url对应返回数据
    response = opener.open(url, data=post_data).read().decode(CODE_MODE)
    print(response)
    return response
# 构造Cookie
cookieProc = request.HTTPCookieProcessor(cookiejar.CookieJar())
opener = request.build_opener(cookieProc)
# 读取登陆页信息,并返回Cookie
loginResponseStr = openWeb(opener, LOGIN_URL)
# 响应数据保存在本地
writeFile(loginResponseStr, LOGIN_FILE_NAME)
# 正则获取某SDN返回的验证信息
ltValue = getValueByName('lt')
executionValeu = getValueByName('execution')
_eventIdValue = getValueByName('_eventId')
# 构造post参数
postData = {
    'username': '******',
    'password': '******',
    'lt': ltValue,
    'execution': executionValeu,
    '_eventId': _eventIdValue,
}
# 构造请求Header
opener.addheaders = [
    # 发起最初请求页面
    ('Origin', LOGIN_URL),
    # 浏览器标识
    ('User-Agent',
     'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'),
]
# 真正模拟登陆(带参数)
openWeb(opener, LOGIN_URL, postData)
# 打开个人主页
homeResponseStr = openWeb(opener, PERSON_HOME_URL)
# 响应数据保存在本地
writeFile(homeResponseStr, PERSON_HOME_FILE_NAME)
# 使用html5lib库解析
soup = BeautifulSoup(homeResponseStr, "html5lib")
nickName = soup.find('div', class_='phrf_name').contents[0].contents[0].string
# nickName = soup.select("[class~=phrf_name]")[0].contents[0].contents[0].string
print(nickName)
print(soup.prettify())

broken thoughts

The three-step process :

  1. Open the login page, obtain cookieand verify the information (regular matching lt, executionand _eventId).
  2. To simulate clicking the login button, this step requires constructing the complete data of the login form and adding it User-Agent.
  3. After the verification is passed, you can open various pages~

The last few lines are used BeautifulSoupto parse the data, here is an example of getting a nickname. The target code snippet is as follows:

<div class="phrf_name"><span><a href="/******" target=_blank>一口仨馍</a></span</div>

BeautifulSoupA similar jQuerylanguage is provided to parse HTMLdocuments. AndroidThere is one JSoup. I don't know which one comes first, but I feel that the two are really similar from name to function. If you happen to understand jQueryor JSoup, it should be relatively easy to understand. But it doesn't matter if it doesn't, it's pretty simple anyway. More BeautifulSoupsyntax: https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

write at the end

Recently, I can always hear the two words, and I Pythonhave learned for a few days with the mentality of trying it out Python. It feels Pythonreally concise, dozens of lines of code can do so much. It's really convenient for a lazy person like me! But, after all, Pythonthe novice is on the road, there are some inaccurate or even wrong views in the text, please bear with the old driver and give pointers~

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325543750&siteId=291194637