在下面的代码中, 展示了使用Python脚本登录Github的方法。 如果需要登录别的网站,那么请使用Chrome的Inspect的功能寻找到目标的object,对代码进行替换。
代码先登录了github网站,然后在登录过的session里打开了discover页面,然后统计了一下这个网页里加载了多少个项目。
废话不多说,上代码。
from requests import session from bs4 import BeautifulSoup as bs USER = '[email protected]' PASSWORD = 'InputYourPassword(^_^)' URL1 = 'https://github.com/session' URL2 = 'https://github.com/discover' with session() as s: req = s.get(URL1).text html = bs(req, "lxml") token = html.find("input", {"name": "authenticity_token"}).attrs['value'] com_val = html.find("input", {"name": "commit"}).attrs['value'] login_data = {'login': USER, 'password': PASSWORD, 'commit' : com_val, 'authenticity_token' : token} r1 = s.post(URL1, data = login_data) r2 = s.get(URL2) data2 = r2.content page_html = data2 page_soup = bs(page_html, "html.parser") containers = page_soup.findAll("div", {"class":"mb-1"}) print("On this page, there are how many projects listed? \n") print(len(containers)) |
上面代码在Python 3.6.5上调试通过并成功运行。
参考资料
================
Intro to Web Scraping with Python and Beautiful Soup
https://www.youtube.com/watch?v=XQgXKtPSzUI&t=1507s