Python Reptile use your browser's cookies: browsercookie

With a lot of people may have written Python web crawler, automated data acquisition network is indeed a pleasant thing, and Python very good to help us achieve this pleasure. However, reptiles often encounter various login, authentication obstruct, people discouraged (website: daily encounter a variety of reptiles caught our website, and very frustrated people ~). Reptiles and anti-reptile is a cat and mouse game, one foot step ahead, both pesters.

Because stateless http protocol, login authentication is achieved by passing cookies. Log in once, cookie login information that will be saved by the browser browser. The next time you open the site, the browser automatically saved to bring cookies, cookies only has not expired, for web sites, you will still logged in.

browsercookie module is such a tool to extract the saved cookies from the browser. It is a very useful tool reptiles, by loading your browser cookies to a cookiejar objects inside, allows you to easily download content requires login.

installation

pip install browsercookie

On Windows systems, built-in sqlite module when loading FireFox database will throw an error. You need to update the sqlite version:
pip install pysqlite

Instructions

Here is an example extracted from the title page:

>>> import re
>>> get_title = lambda html: re.findall('<title>(.*?)</title>', html, flags=re.DOTALL)[0].strip()

 

Here is the title under the unregistered status of the downloaded:

>>> import urllib2
>>> url = 'https://bitbucket.org/'
>>> public_html = urllib2.urlopen(url).read()
>>> get_title(public_html)
'Git and Mercurial code management for teams'

 

Then use browsercookie had Bitbucket from a login cookie to download FireFox get inside:

Copy the code
>>> import browsercookie
>>> cj = browsercookie.firefox()
>>> opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
>>> login_html = opener.open(url).read()
>>> get_title(login_html)
'richardpenman / home &mdash; Bitbucket'
Copy the code

 

The above is Python2 code, and then try Python3:

>>> import urllib.request
>>> public_html = urllib.request.urlopen(url).read()
>>> opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))

 

You can see your user name appear in the title inside, indicating browsercookie module successfully loaded the cookies from FireFox. Xiao Bian finishing a set of Python data and PDF, need to learn Python learning materials can be added to the group: 631 441 315, anyway idle is idle it, it is better to learn a lot friends ~ ~

Here is an example use requests, this time we load cookies from inside Chrome, of course, you need to log in advance with Chrome Bitbucket:

>>> import requests
>>> cj = browsercookie.chrome()
>>> r = requests.get(url, cookies=cj)
>>> get_title(r.content)
'richardpenman / home &mdash; Bitbucket'

 

If you do not know or do not care about the browser has cookies you need, you can do this:

>>> cj = browsercookie.load()
>>> r = requests.get(url, cookies=cj)
>>> get_title(r.content)
'richardpenman / home &mdash; Bitbucket'

 

stand by

Currently, the module supports the following platforms:

Chrome: Linux, OSX, Windows
Firefox: Linux, OSX, Windows

Guess you like

Origin www.cnblogs.com/qingdeng123/p/11655207.html