爬虫学习:登录GitHub
目标:使用Requests包实现GitHub的登录
1.## 确定表单参数
多次抓包对比表单参数
commit: Sign in
utf8: ✓
authenticity_token: sO34KvtovZgqSKQsVIkEdWbwX6ykeuzCMxuZbWul6wUmlpz/3Hc4SaeuRB5WEWbL1JbkgYL3r9Na1ivFxM+o+w==
ga_id: 1192443032.1565138303
login: 用户名
password: 明文密码
webauthn-support: supported
webauthn-iuvpaa-support: unsupported
required_field_34aa:
timestamp: 1573029556609
timestamp_secret: bc3d494a0b7f36c58e7b3dc07c52fcd3e149456f46aff70797e3709c766434c7
commit: Sign in
utf8: ✓
authenticity_token: M0Xosj8ILvss0InDr0iNNiVylyczk06WBKmc6mfRbjKefRzgRUiPVzOKmu3CeVu4rAbQd7mj1EC99oP5yLCNoQ==
ga_id: 1192443032.1565138303
login: 用户名
password: 明文密码
webauthn-support: supported
webauthn-iuvpaa-support: unsupported
required_field_90c0:
timestamp: 1573029878918
timestamp_secret: 68be517605bc020dbc20be18cb90323267ac88ff650b912fbe087df1be9fe117
通过对比:
固定值:
1. commit: Sign in
2. utf8: ✓
3. login: 用户名
4. password: 明文密码
5. webauthn-support: supported
6. webauthn-iuvpaa-support: unsupported
7. ga_id: 1192443032.1565138303
变化值:
1. authenticity_token: M0Xosj8ILvss0InDr0iNNiVylyczk06WBKmc6mfRbjKefRzgRUiPVzOKmu3CeVu4rAbQd==
2. required_field_90c0:
3. timestamp: 1573029878918
4. timestamp_secret: 68be517605bc020dbc20be18cb90323267ac88ff650b912fbe087df1be9fe117
2.## 分析表单参数
通过抓取登录页源码,发现
<input type="hidden" name="authenticity_token" value="K4gFC3qrPOfJVi8kLoPtjJg2dUp6Yisz4YG2sHktnw8Yu1nAo2n7vVVlupbmMQyTt5iKRLTJZb/+wA6FqPPV4g==">
<input type="hidden" name="ga_id" class="js-octo-ga-id-input" value="1192443032.1565138303">
<input type="text" name="required_field_01fb" id="required_field_01fb" hidden="hidden" class="form-control">
<input type="hidden" name="timestamp" value="1573029624696" class="form-control">
<input type="hidden" name="timestamp_secret" value="25e1caaf1d72b3184f9b551d96750d01eb6871d0adf2e795fb211e58c82f9958" class="form-control">
故,我们可以提前访问登录页,获取这些变化的表单参数
3.## 代码思路整理
- 提前访问GitHub登录页
- 从登录页面源码提取并构建表单
- 提交表单
- 验证登录
4.## 代码编写
import requests
import re
class Github(object):
def __init__(self, username, password):
self.headers = {
'Referer': 'https://github.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36',
'Host': 'github.com'
} # 带上请求头伪装
self.session = requests.Session() # 创建一个session对象
self.username = username
self.password = password
def get_login_sources(self):
login_url = "https://github.com/login"
responses = self.session.get(url=login_url,headers=self.headers)
return responses
def get_form_data(self, responses):
form_data = {
}
# print(responses.content.decode())
form_data["commit"] = "Sign in"
form_data["utf8"] = "✓"
form_data["authenticity_token"] = re.findall('name="authenticity_token" value="(.*?)"', responses.content.decode())[0] # 得到是一个列表, 故使用下标
form_data["ga_id"] = "1192443032.1565138303"
form_data["login"] = self.username
form_data["password"] = self.password
form_data["webauthn-support"] = "supported"
form_data["webauthn-iuvpaa-support"] = "unsupported"
required_field_name = re.findall('name="(.*?)" id="required_field_', responses.content.decode())[0]
form_data[required_field_name] = ""
form_data["timestamp"] = re.findall('name="timestamp" value="(.*?)"', responses.content.decode())[0]
form_data['timestamp_secret'] = re.findall('name="timestamp_secret" value="(.*?)"', responses.content.decode())[0]
return form_data
def post_form_data(self, form_data):
post_url = "https://github.com/session"
responses_2 = self.session.post(url=post_url, data=form_data, headers=self.headers)
return responses_2
def are_you_logged_in(self):
logged_url = "https://github.com/" + self.username
response_3 = self.session.get(logged_url)
# 保存用户页面
with open("GitHubasd.html", "wb")as f:
f.write(response_3.content)
# 验证登陆
title = re.findall('<title>(.*?)</title>', response_3.content.decode())[0]
print(title)
if title == self.username:
print("登陆成功")
else:
print('登陆失败')
def run(self):
# 1.提前访问登录页
responses = self.get_login_sources()
# 2.提取并构建表单数据
form_data = self.get_form_data(responses)
# 3.提交表单
responses_2 = self.post_form_data(form_data)
# 4.验证登陆
self.are_you_logged_in()
if __name__ == '__main__':
username = input("请输入用户名:")
password = input("请输入密码:")
user = Github(username, password)
user.run()