First, the requirement
1. Programming crawling the latest epidemic daily statistics
2. The results of crawling into the database
3. The visual results with the combination of statistical data, real-time display the current date data
Second, the idea
1. Using python crawling disguised browser data (crawling Address: Doctors · Lilac Park Lilac )
2. The resulting split block of code to obtain the desired value of the data
3 into a database (with insert data line)
Third, the problems encountered
1. The beginning is to try to look at Java Can achieve, but not for a long time to find a suitable demo
2.python connected databases
3. how split data block data to get what you want
Fourth, the source code
from OS Import path Import Requests from BS4 Import the BeautifulSoup Import JSON Import pymysql Import numpy AS NP Import Time # request address URL = ' https://ncov.dxy.cn/ncovh5/view/pneumonia?from=timeline&isappinstalled=0 ' # for Avoidance of climb, disguised browser: # Create the header information headers = { ' User-Agent ' : ' the Mozilla / 5.0 (the Windows NT 10.0; Win64; x64-) AppleWebKit / 537.36 (KHTML, like the Gecko) the Chrome / 74.0.3729.131 Safari / 537.36 ' } Response = requests.get (URL, headers = headers) # network request # Print (response.content.decode ( 'UTF-. 8')) # print page source stream of bytes Content = response.content.decode ( ' . 8-UTF ' ) # Print (Content) Soup = the BeautifulSoup (Content, ' html.parser ' ) # specified parser is Beautiful "html.parser" ' '' * Find () returns the first matching tag results * find_all () returns a list of all matching results ' '' listA = soup.find_all (name = ' Script ' , attrs = { " ID " : "getAreaStat" }) # World diagnosed listB = soup.find_all (name = ' Script ' , attrs = { " ID " : " getListByCountryTypeService2 " }) Account = STR (listA) # is converted into a string Print (Account) messages = Account [52 is : -21] # taken from 52 to 21 reciprocal back Print (messages) messages_json = json.loads (messages) # json.loads JSON data for decoding the function returns the field data type Python.. ValuesList = [] CityList = [] CON = len (messages_json) #Python len () method returns Object (character list, tuple, etc.) or the number of items in a length of K = 0 for I in Range (len (messages_json)): # 0 to len K = K. 1 + # The time.strftime ( '% Y-% m-% d% H:% M:% S', time.localtime (time.time ())) format the current time value = (K, The time.strftime ( ' %% Y-M- D% H%:% M:% S ' , time.localtime (the time.time ())), messages_json [I] .get ( ' provinceShortName ' ), None, messages_json [I] .get ( ' confirmedCount ' ), messages_json [I] .get ( ' suspectedCount ' ), messages_json [I] .get ( ' curedCount ' ), messages_json [I] .get ('deadCount'),messages_json[i].get('locationId')) valuesList.append(value)#进行添加到list中 cityValue = messages_json[i].get('cities') for j in range(len(cityValue)): con=con+1 cityValueList = (con,time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())),messages_json[i].get('provinceShortName'),cityValue[j].get('cityName'),cityValue[j].get('confirmedCount ' ), cityValue [J] .get ( ' suspectedCount ' ), cityValue [J] .get ( ' curedCount ' ), cityValue [J] .get ( ' deadCount ' ), cityValue [J] .get ( ' locationld ' )) cityList.append (cityValueList) # database write # open database Connectivity DB = pymysql.connect ( " localhost " , " the root " , " xp20010307 .. " , " Yiqing " , charset = 'Utfa8 ') # Use this method to get the cursor operation Cursor = db.cursor () Array = np.asarray (ValuesList [0]) sql_clean_province = " TRUNCATE TABLE INF02 " SQL = " the INSERT the INTO INF02 values (% S,% S,% S, S%,% S,% S,% S,% S,% S) " value_tuple = tuple (ValuesList) cityTuple = tuple (CityList) the try : the cursor.execute (sql_clean_province) the db.commit () the except : '' ' Connection .rollback () changes made to the database since the rollback method since the last call to the commit () '' ' Print ( 'Failed to enter the callback. 1 ' ) db.rollback () the try : cursor.executemany (SQL, value_tuple) the db.commit () the except : Print ( ' failed to enter the callback. 3 ' ) db.rollback () the try : Cursor. executemany (SQL, cityTuple) the db.commit () the except : Print ( ' failed to enter the callback. 4 ' ) db.rollback () db.Close ()
date | Starting time | End Time | Downtime | Net time | activity | Remark |
3.10 | 15:40 | 17:20 | 10 | 80 | Online learning to use Java to do crawling | |
3.10 | 17:45 | 18.20 | 0 | 35 | Installation pycharm, find demo | |
3.10 | 18:30 | 18:55 | 0 | 25 | Learn some knowledge of databases | |
3.10 | 19:00 | 19:30 | 0 | 30 | Split the data, to achieve storage |
date | Numbering | Types of | The introduction stage | Out phase | Time to Repair | Repair defects |
3.10 | 1 | Thinking problems | From | No direction | Transforming ideas | |
Description: In the beginning intends to achieve in Java, but has no clear direction and no complete demo learn from | ||||||
3.10 | 2 | Lack of knowledge | coding | Online Learning | 20min | Data collection has been thought of himself |
Description: read data stored but will not be split up to get the data you want part | ||||||
3.10 | 3 | Lack of knowledge | coding | Online Learning | 20min | Database for data entry |
Description: no contact python connect to the database, to learn a little statement with connectivity |