《Python编程从入门到实践》记录之Python处理CSV文件数据

目录

1、分析CSV文件(reader()函数、next()函数)

2、打印文件头及其位置

3、提取并读取、显示数据

4、在图表中添加日期(datetime模块)


csv模块包含在Python标准库中,可用于分析CSV文件中的数据行。

1、分析CSV文件(reader()函数、next()函数)

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import csv  # 导入CSV模块
from datetime import datetime

from matplotlib import pyplot as plt

# Get dates, high, and low temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    print(header_row)

代码解析:

  • csv.reader()函数,创建一个与该文件关联的阅读器对象。
  • CSV模块包含函数next(),返回文件中的下一行,上述代码只调用了一次next(),得到文件的第一行。
  • 将reader返回的数据存储在header_row中,得到与天气相关的文件头,指出每行包括哪些数据

运行结果:

['AKDT', 'Max TemperatureF', 'Mean TemperatureF', 'Min TemperatureF', 'Max Dew PointF', 
'MeanDew PointF', 'Min DewpointF', 'Max Humidity', ' Mean Humidity', ' Min Humidity', 
' Max Sea Level PressureIn', ' Mean Sea Level PressureIn', ' Min Sea Level PressureIn', 
' Max VisibilityMiles', ' Mean VisibilityMiles', ' Min VisibilityMiles', ' Max Wind SpeedMPH', 
' Mean Wind SpeedMPH', ' Max Gust SpeedMPH', 'PrecipitationIn', ' CloudCover', ' Events', ' WindDirDegrees']

2、打印文件头及其位置

为了让文件头数据更容易理解,将列表中的每个头文件及其位置打印出来:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import csv
from datetime import datetime

from matplotlib import pyplot as plt

# Get dates, high, and low temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    for index, colum_header in enumerate(header_row):
        print(index, colum_header)

代码解析:

对列表header_row调用了enumerate来获取每个元素的索引及其值

运行结果:

0 AKDT
1 Max TemperatureF
2 Mean TemperatureF
3 Min TemperatureF
4 Max Dew PointF
5 MeanDew PointF
6 Min DewpointF
7 Max Humidity
8  Mean Humidity
9  Min Humidity
10  Max Sea Level PressureIn
11  Mean Sea Level PressureIn
12  Min Sea Level PressureIn
13  Max VisibilityMiles
14  Mean VisibilityMiles
15  Min VisibilityMiles
16  Max Wind SpeedMPH
17  Mean Wind SpeedMPH
18  Max Gust SpeedMPH
19 PrecipitationIn
20  CloudCover
21  Events
22  WindDirDegrees

3、提取并读取、显示数据

知道数据的位置后,我们来读取每天的最高气温:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import csv
from datetime import datetime

from matplotlib import pyplot as plt

# Get dates, high, and low temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    # 读取数据
    highs = []
    for row in reader:
        high = int(row[1])  # 转换为数值
        highs.append(high)

    print(highs)

    # 绘制气温图表
    fig = plt.figure(dpi=128, figsize=(10, 6))
    plt.plot(highs, c='red')

    # 设置图形的格式
    plt.title("Daily high temperatures, July 2014", fontsize=24)
    plt.xlabel("", fontsize=16)
    plt.ylabel("Temperature(F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)

    plt.show()

运行结果:

[64, 71, 64, 59, 69, 62, 61, 55, 57, 61, 57, 59, 57, 61, 64, 61, 59, 63, 60, 57, 69,
 63, 62, 59, 57, 57, 61, 59, 61, 61, 66]


4、在图表中添加日期(datetime模块)

在获取该数据时,获得的是一个字符串,所以我们需要将字符串‘2014-7-1’转换为一个表示相应日期的对象。这就会用到模块datetime中的方法strptime。

strptime需要两个参数:(1)需要转换为日期的字符串;(2)设置日期的格式。

strptime()可以接受各种实参来设置日期格式,下表给出其中一些实参:

实参 含义
%A 兴趣的名称,如Monday
%B 月分名,如January
%m 用数字表示的月份名(01~12)
%d 用数字表示月份中的一天(01~31)
%Y 四位的年份,如2015
%y 两位的年份,如15
%H 24小时制的小数(00~23)
%I 12小时制的小数(01~12)
%p am或pm
%M 分钟数(00~59)
%S 秒数(00~59)

下述代码为图表添加日期:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import csv
from datetime import datetime

from matplotlib import pyplot as plt

# Get dates, high, and low temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    # 读取数据,获取日期和最高气温

    dates, highs = [], []  # 两个空列表,用于存储日期和最高温气温
    for row in reader:
        # 读取日期数据
        current_data = datetime.strptime(row[0], "%Y-%m-%d")
        dates.append(current_data)
        # 读取最高温数据
        high = int(row[1])  # 转换为数值
        highs.append(high)

    # 绘制气温图表
    fig = plt.figure(dpi=128, figsize=(10, 6))
    plt.plot(dates, highs, c='red')

    # 设置图形的格式
    plt.title("Daily high temperatures, July 2014", fontsize=24)
    plt.xlabel("", fontsize=16)
    fig.autofmt_xdate()   # 绘制斜的日期标签
    plt.ylabel("Temperature(F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)

    plt.show()

运行结果: 


下边绘制一个整年的天气数据图:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import csv
from datetime import datetime

from matplotlib import pyplot as plt

# Get dates, high, and low temperatures from file.
filename = 'sitka_weather_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    # 读取数据,获取日期和最高气温

    dates, highs = [], []  # 两个空列表,用于存储日期和最高温气温
    for row in reader:
        # 读取日期数据
        current_data = datetime.strptime(row[0], "%Y-%m-%d")
        dates.append(current_data)
        # 读取最高温数据
        high = int(row[1])  # 转换为数值
        highs.append(high)

    # 绘制气温图表
    fig = plt.figure(dpi=128, figsize=(10, 6))
    plt.plot(dates, highs, c='red')

    # 设置图形的格式
    plt.title("Daily high temperatures, July 2014", fontsize=24)
    plt.xlabel("", fontsize=16)
    fig.autofmt_xdate()   # 绘制斜的日期标签
    plt.ylabel("Temperature(F)", fontsize=16)
    plt.tick_params(axis='both', which='major', labelsize=16)

    plt.show()

运行结果:


下边绘制一个最高气温和最低气温数据,并给区域着色: 

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import csv
from datetime import datetime

from matplotlib import pyplot as plt

# Get dates, high, and low temperatures from file.
filename = 'sitka_weather_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    dates, highs, lows = [], [], []  # 三个空列表,存储如期、最高气温和最低气温
    for row in reader:
        try:
            current_date = datetime.strptime(row[0], "%Y-%m-%d")
            high = int(row[1])
            low = int(row[3])
        except ValueError:
            print(current_date, 'missing data')
        else:
            dates.append(current_date)
            highs.append(high)
            lows.append(low)

# Plot data.
fig = plt.figure(dpi=128, figsize=(10, 6))
plt.plot(dates, highs, c='red', alpha=0.5)
plt.plot(dates, lows, c='blue', alpha=0.5)
# 区域着色,facecolor为填充颜色,alpha透明度
plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)

# Format plot.
title = "Daily high and low temperatures - 2014\nDeath Valley, CA"
plt.title(title, fontsize=20)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)

plt.show()

运行结果:

猜你喜欢

转载自blog.csdn.net/Sophia_11/article/details/84888183