Foreword
Python package can be reptiles, there are many, but the requests are known as "Let the HTTP service of mankind" ... the tone is not small, but it does just as well.
This is the first blog in reptiles, to achieve a very simple function: Get your own blog home page visits.
Of course, reptiles certainly can not escape the general use regex (regular expression), and therefore the re Python package is also very common.
analysis
Open Garden blog site and log in, click on the left "My essay":
Click F12 can view the page source code:
And then found each of which showed the amount of reading "to read:" + digital, pay attention to where the colon is in English, the number of digits uncertain.
Regular expressions, a number '\ d' can be described,
Appears 0-n times with '*', appeared 0-n times with '+', appears 0-1 times with '?'
Here, the "reading:" behind must have a digital, hence the '*' or '+' are possible.
Code
Import Requests Import Re url = ' https://cnblogs.com/maoerbao/p/ ' # I have a collection of essays URLs for all f = requests.get (url) .text # Gets html page content and convert it to text a = re .findall ( ' read: \ D * ' , F) # regular expressions, extracting an amount of each reading zydl = 0 L = [] for I in a: YDL = int (I [. 3 :]) zydl = + YDL zydl L.append (YDL) Print ( 'Alberta blog: \ n- ' ) Print ( ' Total number of articles:% d ' % len (L)) Print ( ' Total amount of reading:% d ' % zydl) Print ( ' the largest single piece of read:% d ' % max (L)) Print ( ' smallest unit of read articles: D% ' % min (L))
operation result