愉快的学习就从翻译开始吧_Multivariate Time Series Forecasting with LSTMs in Keras_2_Basic Data Preparation

2. Basic Data Preparation/基本数据准备

The data is not ready to use. We must prepare it first.

数据没有准备好来用,我们必须先准备好它(说处理多好理解)

Below are the first few rows of the raw dataset.

The first step is to consolidate the date-time information into a single date-time so that we can use it as an index in Pandas.

第一步是将日期时间信息合并一单独的时间日期列(列),以便我们将其作为pandas中的索引。

A quick check reveals NA values for pm2.5 for the first 24 hours. We will, therefore, need to remove the first row of data. There are also a few scattered “NA” values later in the dataset; we can mark them with 0 values for now.

快速检查显示前24小时的pm2.5值为NA,所以我们将需要删除数据的第一行,数据集中还有一些零散的NA值,我们可以用0值标记他们

The script below loads the raw dataset and parses the date-time information as the Pandas DataFrame index. The “No” column is dropped and then clearer names are specified for each column. Finally, the NA values are replaced with “0” values and the first 24 hours are removed.

下面的脚步加载原始数据,并将日期-时间信息解析为Pandas DataFrame的索引,‘No’列被删除,然后被每列指定更清晰的名字,最后NA值被0取代,最初的24小时(数据)被移除。

The “No” column is dropped and then clearer names are specified for each column. Finally, the NA values are replaced with “0” values and the first 24 hours are removed.(重复,排版出错吗?)

from pandas import read_csv
from datetime import datetime


# load data
def parse(x):
    return datetime.strptime(x, '%Y %m %d %H')


dataset = read_csv('raw.csv', parse_dates=[['year', 'month', 'day', 'hour']], index_col=0, date_parser=parse)
dataset.drop('No', axis=1, inplace=True)
# manually specify column names
dataset.columns = ['pollution', 'dew', 'temp', 'press', 'wnd_dir', 'wnd_spd', 'snow', 'rain']
dataset.index.name = 'date'
# mark all NA values with 0
dataset['pollution'].fillna(0, inplace=True)
# drop the first 24 hours
dataset = dataset[24:]
# summarize first 5 rows
print(dataset.head(5))
# save to file
dataset.to_csv('pollution.csv')

Running the example prints the first 5 rows of the transformed dataset and saves the dataset to “pollution.csv“.

Now that we have the data in an easy-to-use form, we can create a quick plot of each series and see what we have.

现在我有了容易用 格式的数据,我们可以创建每一个系列的快速图,来查看我们都有了什么

The code below loads the new “pollution.csv” file and plots each series as a separate subplot, except wind speed dir, which is categorical.

下面的代码加载新的“pollution.csv”文件,并将每个系列绘制为一个单独的子图,除了风速目录(这是明确的)之外。

Running the example creates a plot with 7 subplots showing the 5 years of data for each variable.

运行该示例将创建一个包含7个子图的图表,显示每个变量的5年数据。

Line Plots of Air Pollution Time Series

Line Plots of Air Pollution Time Series

matplotlib中各个函数的含义及用法看下面的例子:

在这段代码中又出现了一个新的东西叫做,一个用ax命名的对象。在Matplotlib中,画图时有两个常用概念,一个是平时画图蹦出的一个窗口,这叫一个figure。Figure相当于一个大的画布,在每个figure中,又可以存在多个子图,这种子图叫做axes。顾名思义,有了横纵轴就是一幅简单的图表。在上面代码中,先把figure定义成了一个一行两列的大画布,然后通过fig.add_subplot()加入两个新的子图。subplot的定义格式很有趣,数字的前两位分别定义行数和列数,最后一位定义新加入子图的所处顺序,当然想写明确些也没问题,用逗号分开即可。。上面这段代码产生的图像如下:




原文地址:https://mp.csdn.net/postedit/80699358

猜你喜欢

转载自blog.csdn.net/dreamscape9999/article/details/80699358