数据可视化-------------------------------------下载数据学习(一)

下载数据,并进行可视化分析,以下学习两种格式的数据:1.CSV,对应的使用Python模块的CSV模块来处理CSV文件,2.json.对应使用json模块处理数据。

1.CSV文件格式

CSV文件是一系列的以逗号分割的数据,这样利于程序提取数据。来做一个关于天气的数据分析

import csv 

filename='sitka_weather_07-2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    print(header_row)

调用csv,reader(文件),创建一个与该文件相关的阅读器,调用next()将阅读器对象传递给它,它将返回文件中的下一行。

输出结果:

打印文件头的及其位置

import csv 

filename='sitka_weather_07-2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    for index,column_header in enumerate(header_row):
        print(index,column_header)
 

输出结果:

提取读取数据

import csv 

filename='sitka_weather_07-2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    
    highs=[]
    for row in reader:
        highs.append(row[1])
    print(highs)

输出结果:

将字符串变为数字,

import csv 

filename='sitka_weather_07-2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    
    highs=[]
    for row in reader:
        high=int(row[1])
        highs.append(high)
    print(highs)

现在提取数据已经完成,现在就是对数据进行处理,可视化分析。

绘制气温图表:

import csv 
from matplotlib import pyplot as plt

filename='sitka_weather_07-2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    
    highs=[]
    for row in reader:
        high=int(row[1])
        highs.append(high)
    print(highs)
#可视化
fig=plt.figure(dpi=128.figsize=(10,6))
plt.plot(highs,c='red')
plt.title("Daily high temperatures,July 2014",fontsize=24)
plt.xlabel('',fontsize=14)
plt.ylabel("Temperature(F)",fontsize=14)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()

输出结果:

在上面x坐标并未添加时间,导入datetime模块,添加时间:

#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv 
from matplotlib import pyplot as plt
from datetime import datetime

filename='sitka_weather_07-2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    
    dates,highs=[],[]
    for row in reader:
        current_date=datetime.strptime(row[0],"%Y-%m-%d")
        dates.append(current_date)
        high=int(row[1])
        highs.append(high)
    print(highs)
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
plt.title("Daily high temperatures,July 2014",fontsize=24)
plt.xlabel('',fontsize=14)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=14)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()

涵盖更多的事件:

#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv 
from matplotlib import pyplot as plt
from datetime import datetime

filename='sitka_weather_2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    
    dates,highs=[],[]
    for row in reader:
        current_date=datetime.strptime(row[0],"%Y-%m-%d")
        dates.append(current_date)
        high=int(row[1])
        highs.append(high)
    print(highs)
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
plt.title("Daily high temperatures 2014",fontsize=24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=16)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()

再绘制一个图表:

#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv 
from matplotlib import pyplot as plt
from datetime import datetime

filename='sitka_weather_2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    
    dates,highs,lows=[],[],[]
    for row in reader:
        current_date=datetime.strptime(row[0],"%Y-%m-%d")
        dates.append(current_date)
        low=int(row[3])
        lows.append(low)
        high=int(row[1])
        highs.append(high)
    print(highs)
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red')
plt.plot(dates,lows,c='blue')
plt.title("Daily high and low temperatures 2014",fontsize=24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=16)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()

绘图板区域着色:使用方法fill_between(),它接受一个x值系列和两个y值,填充两个y值之间的空间:

#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv 
from matplotlib import pyplot as plt
from datetime import datetime

filename='sitka_weather_2014.csv'
with open(filename) as f:
    reader=csv.reader(f)
    header_row=next(reader)
    
    dates,highs,lows=[],[],[]
    for row in reader:
        current_date=datetime.strptime(row[0],"%Y-%m-%d")
        dates.append(current_date)
        low=int(row[3])
        lows.append(low)
        high=int(row[1])
        highs.append(high)
    
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red',alpha=0.5)
plt.plot(dates,lows,c='blue',alpha=0.5)
plt.fill_between(dates,highs,lows,facecolor='blue',alpha=0.1)
plt.title("Daily high and low temperatures 2014",fontsize=24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=16)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()

alpha是指定颜色的透明度,取值范围为0-1,0表示完全透明,1表示完全不透明。 

 

异常处理;python无法处理空字符,处理这种异常就可以用到之前学到的try-except异常处理:

在death_valley_2014.csv里有几个是空白;

#-*-coding:GBK-*-
#-*-coding:utf-8-*-
import csv 
from matplotlib import pyplot as plt
from datetime import datetime

filename='death_valley_2014.csv'
with open(filename) as f:
	reader=csv.reader(f)
	header_row=next(reader)
    
	dates,highs,lows=[],[],[]
	for row in reader:
		try:
			current_date=datetime.strptime(row[0],"%Y-%m-%d")
			low=int(row[3])
			high=int(row[1])
		except ValueError:
			print(current_date,'missing data')
		else:
			dates.append(current_date)
			lows.append(low)
			highs.append(high)
    
#可视化
fig=plt.figure(dpi=128,figsize=(10,6))
plt.plot(dates,highs,c='red',alpha=0.5)
plt.plot(dates,lows,c='blue',alpha=0.5)
plt.fill_between(dates,highs,lows,facecolor='blue',alpha=0.1)
plt.title("Daily high and low temperatures 2014",fontsize=24)
plt.xlabel('',fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature(F)",fontsize=16)
plt.tick_params(axis="both",which="major",labelsize=16)
plt.show()

猜你喜欢

转载自blog.csdn.net/shinhwa96/article/details/84784510