The content of this chapter is taken from Chapter 16 of "Python Programming from Introduction to Practice". We will download data from the Internet and visualize the data. You can analyze it and even observe its laws and associations.
learning target
We will access and visualize data stored in two common formats:
-
CSV
Use the Python module CSV to process weather data stored in CSV format to find the maximum and minimum temperatures in two different regions during the same time period. Then use matplotlib to create a graph from the downloaded data showing the temperature changes in two different regions
-
JSON
uses the Python module json to access the transaction closing price data stored in JSON format, and uses Pygal to draw graphs to explore the periodicity of price changes
Article Directory
-
-
-
- learning target
- 1. CSV file format
-
-
1. CSV file format
To store data in a text file, the easiest way:
Write data to a file as a series of comma-separated values (CSV). Such files are called CSV files.
For example:
2023-5-14,19,99,12,12,1,2,3,4,5,6,7,8,9,1.0,1.1,1.2,,,,,2.3
While CSV files are cumbersome for humans to read, programs can easily extract and manipulate the values, which can help speed up the data analysis process.
We first process a small amount of weather data in CSV format:
These data are stored in: sitka_weather_07-2014.csv which needs to be copied to the same folder as the program.
If you don't have practice materials, it doesn't matter, I have uploaded them to my personal resources, and you can download them from my personal homepage.
All the content I post, even if it is not beneficial to you, is what I have learned, no browsing conditions are required.
All the files I share, even if they are useless to you, are without any download conditions.
The content of the preparatory stage has been completely completed, and now we will officially enter the learning state:
1.1 Analyze CSV file header
The csv module is included in the python standard library and can be used to parse rows of data in CSV files.
Let's look at the first line of this file, which contains a series of descriptions about the data:
highs_lows.py:
import csv
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
print(header_row)
Code snippet description:
We first imported the csv module.
Then store the file, if any, in a variable named filename. Next, open the file and store the result object in f, and then call svc.reader(), passing the previously stored file as an actual parameter to create a reader associated with the file.
We store this reader storage object in reader.
The next() function to call, included from: the reader class.
The next line in the file is returned when the reader's built-in next method is called and the reader is passed to it as an argument.
Let's execute the program first to see the running effect:
Indeed read out the first line of the document.
Because we only called the next() function once, only one line was read.
The reader processes each comma-separated line of data in the file and stores each as an element in a list.
1.2 Print file header and its location
To make the file header data easier to understand, print each file header and its position in the list:
fighs_lows.py:
Here we need to use the enumerate() function to get the index and value of each element.
import csv
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
#print(header_row)
for index,column_header in enumerate(header_row):
print(index,column_header)
After the program is executed, it looks like this:
Here, the first print statement used to print the content information of the file read by reader.next is deleted.
Then call the enumerate() function on the list, the function of this function: get the index and value of each element. Finally, print the header_row variable that stores the content of the next line of the file.
1.3 Extract and read data
Now, we can read the data out. What we need to do now is to extract what we need from the read data.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
print(header_row)
highs = []
for row in reader:
highs.append(row[1])
print(highs)
Here we create an empty list called highs. Then iterate over the remaining lines in the file. The reader object continues reading down from where it left off. And every time it will return the next line of the current position.
Since we read the file header line earlier, we will start reading from the second line here. Every time the loop is executed, the data at index 1, which is the second column, is appended to the end of highs.
Execution effect:
Below we use int() to extract these numbers as numbers so matplotlib can extract them:
highs = []
for row in reader:
high = int(row[1])
highs.apped(high)
print(highs)
Then visualize it:
1.4 Draw a temperature chart
Use matplotlib to plot a graph showing daily maximum temperatures:
highs_lows_mp:
import csv
from matplotlib import pyplot as plt
#从文件中获取最高气温
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
print(header_row)
highs = []
for row in reader:
high = int(row[1])
highs.append(high)
print(highs)
#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(highs,c='red')
#设置图形格式
plt.title("Daily high temperatures, July 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()
1.5 module datetime
The content of our sitka_weather_01-2014.csv file is as follows:
In this file, the first line of data is equivalent to the header of all data.
Starting from the second line is the specific time data.
Let's start adding dates to the chart to make it more useful.
When we want to read time data, what we get is a string, because we need to find a way to convert the string '2017-7-1' into an object representing the corresponding date. To create an object representing July 1, 2014, use the method strptime() in the module datetime.
Example usage:
from datetime import datetime as dt
first_time = dt.strptime('2014-7-1','%Y-%m-%d')
print(first_time)
The effect is this:
Here: We first create a datetime class in datetime, then call its strptime() method with a string containing the desired date as the first argument. The second argument tells python how to format the date.
.
'%Y-' means to treat the part before the first hyphen in the string as a four-digit year.
'%m-' indicates that the part of the string preceding the second hyphen is treated as a number for the month.
'%d' tells python to treat the last hyphen in the string as the first day of the month.
strptime() Acceptable arguments:
Arguments | meaning |
---|---|
%A | The name of the week. Such as Monday |
%B | month name. such as January |
%m | The month represented by numbers (01~12) |
%d | Use numbers to represent the day of the month (01~31) |
%Y | Four-digit year. such as 2023 |
%y | The year in two digits. such as 23 |
%H | 24 hour clock |
%I | hours in 12 hour format |
%p | am or pm |
%M | minutes. (00~59) |
%S | The number of seconds. (00~60) |
1.6 Add the date to the icon
Now that you know how to handle dates in CSV files. The air temperature graph can be improved.
i.e. extract the date and maximum temperature and pass them to plot().
highs_lows_mp.py:
import csv
from matplotlib import pyplot as plt
from datetime import datetime as dt
#从文件中获取最高气温
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
print(header_row)
dates,highs = [],[]
for row in reader:
current_date = dt.strptime(row[0],"%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
print(highs)
#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
#设置图形格式
plt.title("Daily high temperatures, July 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()
This time we created two empty lists to store the dates and maximum temperature extracted from the file. Then, we convert the data (row[0]) containing the date information into a datetime object and append it to the end of the list dates.
Then pass the date and maximum air temperature value to plot(). Call fig.autofmt_xdate() to draw oblique date labels so they don't overlap.
1.7 Covering longer periods of time
Now we are going to add more data to the chart. Take, for example, a weather map of a city.
First put the sitka_weather_2014.csv file into the folder where the program is located. This file contains the weather data of a certain city for a whole year.
After placing it, we should start thinking about how to use code to draw these data into graphs:
import csv
from matplotlib import pyplot as plt
from datetime import datetime as dt
#从文件中获取最高气温
filename = 'sitka_weather_2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
print(header_row)
dates,highs = [],[]
for row in reader:
current_date = dt.strptime(row[0],"%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
print(highs)
#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
#设置图形格式
plt.title("Daily high temperatures - 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()
In fact, this code is no different from the previous example. Just changed the target file to be read. The second is to also modify the title of the drawn chart.
1.8 Draw another data series
Now we need to extract the minimum air temperatures from the data file and add them to the graph.
import csv
from matplotlib import pyplot as plt
from datetime import datetime as dt
#从文件中获取最高气温
filename = 'sitka_weather_2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
print(header_row)
dates,highs,lows = [],[],[]
for row in reader:
current_date = dt.strptime(row[0],"%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
low = int(row[3])
lows.append(low)
print(highs)
#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
#设置图形格式
plt.title("Daily high temperatures - 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()
1.9 Coloring chart areas
After adding two data series, you can understand the range of temperature for each day.
Let's put the finishing touches on this chart by coloring it in to show the daily weather range.
fill_between()
which takes a series of x values and two series of values and fills the space between the two series of y values:
plt.plot(dates,highs,c='red',alpha =0.5)
plt.plot(dates,lows,c='red',alpha =0.5)
plt.fill_between(dates,highs,lows,facecolor ='blue',alpha =0.1)
alpha specifies the transparency of the color. 0 means fully transparent, 1 means completely opaque.
A setting of 0.5 makes both reds and blues appear lighter.
.fill_between
() Explanation:
The x value series we pass here is the list dates, and the two y values passed are: highs, lows. The actual parameter facecolor specifies the color of the filled area.
1.10 Error checking
This is the last section on working with the CSV format.
Now the state of the program is: We use the logic written in highs_lows.py to read the data in the file: sitka_weather_2014.csv.
But if we don't want to read the data in this file.
The first thing to do is to change the name of the file defined in the program:
change titak_weather_2014.csv to death_valley_2014.csv
Then run the program and start reading data:
An error is reported as a result of the operation, and the information is displayed as follows
The reason for this bug is:
Python cannot handle the maximum temperature for one of the days because it cannot convert an empty string (' ') to an integer.
Now, we know why, but for the sake of fact, let's take a look at the content of the read file:
To solve this problem, we can add exception handling to the code program:
import csv
from matplotlib import pyplot as plt
from datetime import datetime as dt
#从文件中获取最高气温
filename = 'death_valley_2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
#print(header_row)
dates,highs,lows = [],[],[]
for row in reader:
try:
current_date = dt.strptime(row[0],"%Y-%m-%d")
high = int(row[1])
low = int(row[3])
except ValueError:
print(current_date,'missing date')
else:
dates.append(current_date)
highs.append(high)
lows.append(low)
#print(highs)
#根据数据绘制图形
fig = plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red',alpha =0.5)
plt.plot(dates,lows,c='red',alpha =0.5)
plt.fill_between(dates,highs,lows,facecolor ='blue',alpha =0.1)
#设置图形格式
plt.title("Daily high temperatures - 2014",fontsize = 24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.xlabel("Temperature(F)",fontsize =16)
plt.tick_params(axis='both',which='major',labelsize =16)
plt.show()
After adding exception handling, our program runs normally again: