python reptiles crawling weather data and graphical display

  Foreword

  Using python for web crawling data is now very common, and weather data for crawling is the entry-level novice operation, many people are beginning to learn reptiles from the weather, this is introduced from China crawling Weather Weather Network data can be input to achieve the city looking for, return to the city next week's weather conditions, saved as a csv file, and display graphical data analysis. And finally with the complete code.

  1, using the module

  Python3. Mainly used to csv, sys, urllib.request and BeautifulSoup4 module, wherein the processing module is designed to csv csv file, the urllib.request http request can be configured, BeautifulSoup4 page information can be resolved. Before using these modules, if there is no need to be installed, can be used to open cmd pip installation. Of course, also you need to file a city name and city code corresponding to enter the city for us to find the corresponding code corresponding weather information extraction. Click here to file content cityinfo, you can view tidied city code, copy the contents of the page is saved as .py file, and then import it into the same path.

  2, the city code extracted from the file to the appropriate city code based on the input city

  cityname = input ( "Please enter a city you want to check the weather:")

  if cityname in cityinfo.city:

  citycode = cityinfo.city[cityname]

  else:

  sys.exit()

  3, making the first request, the contents of the request is answered, that page information

  url = 'http://www.weather.com.cn/weather/' + citycode + '.shtml'

  header = ( "User-Agent", "Mozilla / 5.0 (Windows NT 10.0; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 76.0.3809.132 Safari / 537.36") # header information setting

  http_handler = urllib.request.HTTPHandler()

  opener = urllib.request.build_opener (http_handler) # modify the header information

  opener.addheaders = [header]

  request = urllib.request.Request (url) # production request

  response = opener.open (request) # get response packet

  html = response.read () # read response packet

  html = html.decode ( 'utf-8') # set the encoding, or garbled

  Wherein a header information header is to prevent certain Web sites set up anti reptiles, in chrome browser, press f12 header information header can then click on the network in the browser, a request to find a stream, the stream request can click to see corresponding to the header information.

  4, filter data based on the returned page

  final = [] # initializes a list of saved data

  bs = BeautifulSoup (html, "html.parser") # create objects BeautifulSoup

  body = bs.body # body part of the data acquisition

  data = body.find('div', {'id': '7d'})

  ul = data.find('ul')

  li = ul.find_all('li')

  # All tags are carried out to obtain the contents of the page based on the location of the screening, as shown in the next seven days weather we're looking for are included in the div id tag 7d, the seven-day weather ul again in this div , the only one div ul, so you can use the find method, every day the weather was in the ul li's, and there are multiple li, you must use find_all () method to find all the li, can not use the find method .

  5, data crawling

  i = 0 # Days control crawling

  lows = [] # cryopreserved

  highs = [] # storage temperature

  for day in li: # find a convenient per li

  if i < 7:

  temp = []

  date = day.find ( 'h1'). string # Date obtained

  temp.append(date)

  inf = day.find_all ( 'p') # acquires weather, following traversal li p p requires the use of a plurality of labels instead find find_all

  temp.append(inf[0].string)

  temlow = inf [1] .find ( 'i'). string # minimum temperature

  if inf [1] .find ( 'span') is None: # weather forecast maximum temperatures sometimes may not need to be a judge

  temhigh = None

  temperate = temlow

  else:

  temhigh = inf [1] .find ( 'span'). string # highest temperature

  temhigh = temhigh.replace('℃', '')

  temperate = temhigh + '/' + temlow

  temp.append(temperate)

  final.append(temp)

  i = i + 1

  Here li is obtained from each of the daily weather conditions, the control in 7 days, the corresponding position of each data extracted by the label li below, to be noted that the number of extracted tag, if there are a plurality of the same in the current tag extraction labels, to use find_all () instead find, then [n] corresponding data extraction

  When the extraction temperature should pay attention to a problem, China Weather Network General will display maximum temperature and minimum temperature, but sometimes only show a temperature no maximum temperature, then you want to be a judge, or the script will go wrong. The weather then spliced ​​into a string, and other data together into the final list

  6, write csv file

  with open('weather.csv', 'a', errors='ignore', newline='') as f:

  f_csv = csv.writer(f)

  f_csv.writerows([cityname])

  f_csv.writerows(final)

  Finally, we see weather data stored in the csv file as shown below:

  7, using pygal drawing, before using the module need to install pip install pygal, and then introduced import pygal

  bar = pygal.Line () # Create a line chart

  bar.add ( 'low temperature', lows) # add sequence data of two lines

  bar.add ( 'maximum temperature', highs) # Note lows and highs of type int list

  bar.x_labels = daytimes

  bar.x_labels_major = daytimes[::30]

  bar.x_label_rotation = 45

  bar.title = cityname + 'next seven days the temperature trend chart' # Set the graph title

  bar.x_title = 'date' #x axis title

  bar.y_title = 'Temperature (degrees Celsius)' # y axis title

  bar.legend_at_bottom = True

  bar.show_x_guides = False

  bar.show_y_guides = True

  bar.render_to_file ( 'temperate1.svg') # Save the image as SVG files can be viewed through a browser

  , The final visual display generated weather pattern below:

  8, complete code

  import csv

  import sys

  import urllib.request

  from bs4 import BeautifulSoup # page parsing module

  import pygal

  import cityinfo

  cityname = input ( "Please enter a city you want to check the weather:")

  if cityname in cityinfo.city:

  citycode = cityinfo.city[cityname]

  else:

  sys.exit()

  url = 'http://www.weather.com.cn/weather/' + citycode + '.shtml'

  header = ( "User-Agent", "Mozilla / 5.0 (Windows NT 10.0; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 76.0.3809.132 Safari / 537.36") # header information setting

  http_handler = urllib.request.HTTPHandler()

  opener = urllib.request.build_opener (http_handler) # modify the header information

  opener.addheaders = [header]

  request = urllib.request.Request (url) # production request

  response = opener.open (request) # get response packet

  html = response.read () # read response packet

  html = html.decode ( 'utf-8') # set the encoding, or garbled

  # Initial screening filter according to information obtained by the page

  final = [] # initializes a list of saved data

  bs = BeautifulSoup (html, "html.parser") # create objects BeautifulSoup

  body = bs.body

  data = body.find('div', {'id': '7d'})

  print(type(data))

  ul = data.find('ul')

  li = ul.find_all('li')

  # Crawling data they need

  i = 0 # Days control crawling

  lows = [] # cryopreserved

  highs = [] # storage temperature

  daytimes = [] # Save the Date

  weathers = [] # saved Weather

  for day in li: # find a convenient per li

  if i <7: Wuxi Women's Hospital Which is good http://www.ytsgfk120.com/

  temp = [] # temporary storage of data per day

  date = day.find ( 'h1'). string # Date obtained

  #print(date)

  temp.append(date)

  daytimes.append(date)

  inf = day.find_all ( 'p') # li following traversal p p requires the use of a plurality of labels instead find find_all

  #print (inf [0] .string) # p extract a value of the first tag, i.e., weather

  temp.append(inf[0].string)

  weathers.append(inf[0].string)

  temlow = inf [1] .find ( 'i'). string # minimum temperature

  if inf [1] .find ( 'span') is None: # weather forecast may not be the highest temperature

  temhigh = None

  temperate = temlow

  else:

  temhigh = inf [1] .find ( 'span'). string # highest temperature

  temhigh = temhigh.replace('℃', '')

  temperate = temhigh + '/' + temlow

  # temp.append(temhigh)

  # temp.append(temlow)

  lowStr = ""

  lowStr = lowStr.join(temlow.string)

  lows.append (int (lowStr [: - 1])) # NavigableString than three lines turn into a low temperature and stored in a low temperature type int List

  if temhigh is None:

  highs.append(int(lowStr[:-1]))

  else:

  highStr = ""

  highStr = highStr.join(temhigh)

  highs.append (int (highStr)) # NavigableString temperature over three lines turn into a high temperature type int and stored in list

  temp.append(temperate)

  final.append(temp)

  i = i + 1

  # Weather will eventually acquired write csv file

  with open('weather.csv', 'a', errors='ignore', newline='') as f:

  f_csv = csv.writer(f)

  f_csv.writerows([cityname])

  f_csv.writerows(final)

  Drawing #

  bar = pygal.Line () # Create a line chart

  bar.add ( 'minimum temperature', lows)

  bar.add ( 'highest temperature', highs)

  bar.x_labels = daytimes

  bar.x_labels_major = daytimes[::30]

  # Bar.show_minor_x_labels = False # does not show the smallest scale X-axis

  bar.x_label_rotation = 45

  bar.title = cityname + 'Temperature Trend FIG next seven days'

  bar.x_title = 'date'

  bar.y_title = 'Temperature (degrees Celsius)'

  bar.legend_at_bottom = True

  bar.show_x_guides = False

  bar.show_y_guides = True

  bar.render_to_file('temperate.svg')

Guess you like

Origin www.cnblogs.com/djw12333/p/11627573.html