About how to use python to download CSV format data

The content of this chapter is taken from Chapter 16 of "Python Programming from Introduction to Practice". We will download data from the Internet and visualize the data. You can analyze it and even observe its laws and associations.

learning target

We will access and visualize data stored in two common formats:

  1. CSV

    Use the Python module CSV to process weather data stored in CSV format to find the maximum and minimum temperatures in two different regions during the same time period. Then use matplotlib to create a graph from the downloaded data showing the temperature changes in two different regions

  2. JSON
    uses the Python module json to access the transaction closing price data stored in JSON format, and uses Pygal to draw graphs to explore the periodicity of price changes

1. CSV file format

To store data in a text file, the easiest way:

Write data to a file as a series of comma-separated values ​​(CSV). Such files are called CSV files.

For example:

2023-5-14,19,99,12,12,1,2,3,4,5,6,7,8,9,1.0,1.1,1.2,,,,,2.3

While CSV files are cumbersome for humans to read, programs can easily extract and manipulate the values, which can help speed up the data analysis process.

We first process a small amount of weather data in CSV format:

These data are stored in: sitka_weather_07-2014.csv which needs to be copied to the same folder as the program.

insert image description here
insert image description here

If you don't have practice materials, it doesn't matter, I have uploaded them to my personal resources, and you can download them from my personal homepage.

All the content I post, even if it is not beneficial to you, is what I have learned, no browsing conditions are required.
All the files I share, even if they are useless to you, are without any download conditions.

insert image description here
insert image description here

The content of the preparatory stage has been completely completed, and now we will officially enter the learning state:

1.1 Analyze CSV file header

The csv module is included in the python standard library and can be used to parse rows of data in CSV files.

Let's look at the first line of this file, which contains a series of descriptions about the data:

highs_lows.py:

import csv

filename = 'sitka_weather_07-2014.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    print(header_row)

Code snippet description:

We first imported the csv module.
Then store the file, if any, in a variable named filename. Next, open the file and store the result object in f, and then call svc.reader(), passing the previously stored file as an actual parameter to create a reader associated with the file.
We store this reader storage object in reader.

The next() function to call, included from: the reader class.
The next line in the file is returned when the reader's built-in next method is called and the reader is passed to it as an argument.

Let's execute the program first to see the running effect:
insert image description here

Indeed read out the first line of the document.

Because we only called the next() function once, only one line was read.

The reader processes each comma-separated line of data in the file and stores each as an element in a list.

1.2 Print file header and its location

To make the file header data easier to understand, print each file header and its position in the list:
fighs_lows.py:

Here we need to use the enumerate() function to get the index and value of each element.

import csv

filename = 'sitka_weather_07-2014.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    #print(header_row)

    for index,column_header in enumerate(header_row):
        print(index,column_header)

After the program is executed, it looks like this:
insert image description here

Here, the first print statement used to print the content information of the file read by reader.next is deleted.
Then call the enumerate() function on the list, the function of this function: get the index and value of each element. Finally, print the header_row variable that stores the content of the next line of the file.

1.3 Extract and read data

Now, we can read the data out. What we need to do now is to extract what we need from the read data.

filename = 'sitka_weather_07-2014.csv'

with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    print(header_row)

	highs = []
    for row in reader:
        highs.append(row[1])
    print(highs)

Here we create an empty list called highs. Then iterate over the remaining lines in the file. The reader object continues reading down from where it left off. And every time it will return the next line of the current position.
Since we read the file header line earlier, we will start reading from the second line here. Every time the loop is executed, the data at index 1, which is the second column, is appended to the end of highs.

Execution effect:
insert image description here

Below we use int() to extract these numbers as numbers so matplotlib can extract them:

highs = []
for row in reader:
	high = int(row[1])
	highs.apped(high)
print(highs)

insert image description here

Then visualize it:

1.4 Draw a temperature chart

Use matplotlib to plot a graph showing daily maximum temperatures:
highs_lows_mp:

import csv
from matplotlib import pyplot as plt

#从文件中获取最高气温
filename = 'sitka_weather_07-2014.csv'

with open(filename) as f:
   reader = csv.reader(f)
   header_row = next(reader)
   print(header_row)

   highs = []
   for row in reader:
       high = int(row[1])
       highs.append(high)
   print(highs)


#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(highs,c='red')
#设置图形格式
plt.title("Daily high temperatures, July 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()

insert image description here

1.5 module datetime

The content of our sitka_weather_01-2014.csv file is as follows:

insert image description here
In this file, the first line of data is equivalent to the header of all data.
Starting from the second line is the specific time data.

Let's start adding dates to the chart to make it more useful.
When we want to read time data, what we get is a string, because we need to find a way to convert the string '2017-7-1' into an object representing the corresponding date. To create an object representing July 1, 2014, use the method strptime() in the module datetime.

Example usage:

from datetime import datetime as dt
first_time = dt.strptime('2014-7-1','%Y-%m-%d')
print(first_time)

The effect is this:
insert image description here

Here: We first create a datetime class in datetime, then call its strptime() method with a string containing the desired date as the first argument. The second argument tells python how to format the date.
.
'%Y-' means to treat the part before the first hyphen in the string as a four-digit year.
'%m-' indicates that the part of the string preceding the second hyphen is treated as a number for the month.
'%d' tells python to treat the last hyphen in the string as the first day of the month.

strptime() Acceptable arguments:
Arguments meaning
%A The name of the week. Such as Monday
%B month name. such as January
%m The month represented by numbers (01~12)
%d Use numbers to represent the day of the month (01~31)
%Y Four-digit year. such as 2023
%y The year in two digits. such as 23
%H 24 hour clock
%I hours in 12 hour format
%p am or pm
%M minutes. (00~59)
%S The number of seconds. (00~60)
1.6 Add the date to the icon

Now that you know how to handle dates in CSV files. The air temperature graph can be improved.
i.e. extract the date and maximum temperature and pass them to plot().

highs_lows_mp.py:

import csv
from matplotlib import pyplot as plt
from datetime import datetime as dt

#从文件中获取最高气温
filename = 'sitka_weather_07-2014.csv'

with open(filename) as f:
   reader = csv.reader(f)
   header_row = next(reader)
   print(header_row)

   dates,highs = [],[]
   for row in reader:
       current_date = dt.strptime(row[0],"%Y-%m-%d")
       dates.append(current_date)

       high = int(row[1])
       highs.append(high)
   print(highs)


#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
#设置图形格式
plt.title("Daily high temperatures, July 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()

insert image description here

This time we created two empty lists to store the dates and maximum temperature extracted from the file. Then, we convert the data (row[0]) containing the date information into a datetime object and append it to the end of the list dates.
Then pass the date and maximum air temperature value to plot(). Call fig.autofmt_xdate() to draw oblique date labels so they don't overlap.

1.7 Covering longer periods of time

Now we are going to add more data to the chart. Take, for example, a weather map of a city.

First put the sitka_weather_2014.csv file into the folder where the program is located. This file contains the weather data of a certain city for a whole year.

insert image description here
insert image description here
After placing it, we should start thinking about how to use code to draw these data into graphs:

import csv
from matplotlib import pyplot as plt
from datetime import datetime as dt

#从文件中获取最高气温
filename = 'sitka_weather_2014.csv'

with open(filename) as f:
   reader = csv.reader(f)
   header_row = next(reader)
   print(header_row)

   dates,highs = [],[]
   for row in reader:
       current_date = dt.strptime(row[0],"%Y-%m-%d")
       dates.append(current_date)

       high = int(row[1])
       highs.append(high)
   print(highs)


#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
#设置图形格式
plt.title("Daily high temperatures - 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()

In fact, this code is no different from the previous example. Just changed the target file to be read. The second is to also modify the title of the drawn chart.

1.8 Draw another data series

Now we need to extract the minimum air temperatures from the data file and add them to the graph.

import csv
from matplotlib import pyplot as plt
from datetime import datetime as dt

#从文件中获取最高气温
filename = 'sitka_weather_2014.csv'

with open(filename) as f:
   reader = csv.reader(f)
   header_row = next(reader)
   print(header_row)

   dates,highs,lows = [],[],[]
   for row in reader:
       current_date = dt.strptime(row[0],"%Y-%m-%d")
       dates.append(current_date)

       high = int(row[1])
       highs.append(high)

       low = int(row[3])
       lows.append(low)
   print(highs)


#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
#设置图形格式
plt.title("Daily high temperatures - 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()

insert image description here

1.9 Coloring chart areas

After adding two data series, you can understand the range of temperature for each day.
Let's put the finishing touches on this chart by coloring it in to show the daily weather range.

fill_between()
which takes a series of x values ​​and two series of values ​​and fills the space between the two series of y values:

plt.plot(dates,highs,c='red',alpha =0.5)
plt.plot(dates,lows,c='red',alpha =0.5)
plt.fill_between(dates,highs,lows,facecolor ='blue',alpha =0.1)

insert image description here

alpha specifies the transparency of the color. 0 means fully transparent, 1 means completely opaque.
A setting of 0.5 makes both reds and blues appear lighter.
.fill_between
() Explanation:
The x value series we pass here is the list dates, and the two y values ​​passed are: highs, lows. The actual parameter facecolor specifies the color of the filled area.

1.10 Error checking

This is the last section on working with the CSV format.

Now the state of the program is: We use the logic written in highs_lows.py to read the data in the file: sitka_weather_2014.csv.

But if we don't want to read the data in this file.

The first thing to do is to change the name of the file defined in the program:
change titak_weather_2014.csv to death_valley_2014.csv

Then run the program and start reading data:

An error is reported as a result of the operation, and the information is displayed as follows
insert image description here

The reason for this bug is:

Python cannot handle the maximum temperature for one of the days because it cannot convert an empty string (' ') to an integer.

Now, we know why, but for the sake of fact, let's take a look at the content of the read file:
insert image description here

To solve this problem, we can add exception handling to the code program:

import csv
from matplotlib import pyplot as plt
from datetime import datetime as dt

#从文件中获取最高气温
filename = 'death_valley_2014.csv'

with open(filename) as f:
   reader = csv.reader(f)
   header_row = next(reader)
   #print(header_row)

   dates,highs,lows = [],[],[]

   for row in reader:
       try:
            current_date = dt.strptime(row[0],"%Y-%m-%d")
            high = int(row[1])
            low = int(row[3])
       except ValueError:
           print(current_date,'missing date')
       else:
           dates.append(current_date)
           highs.append(high)
           lows.append(low)
   #print(highs)


#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red',alpha =0.5)
plt.plot(dates,lows,c='red',alpha =0.5)
plt.fill_between(dates,highs,lows,facecolor ='blue',alpha =0.1)
#设置图形格式
plt.title("Daily high temperatures - 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()

After adding exception handling, our program runs normally again:
insert image description here

Guess you like

Origin blog.csdn.net/tianlei_/article/details/130674019