Python Learning Road 15 - Download Data

This series is a compilation of notes for the introductory book "Python Programming: From Getting Started to Practice", which belongs to the primary content. The title sequence follows the title of the book.
This article is the second article in Python data processing. This article will use the data downloaded from the Internet to visualize the data.

1 Introduction

This article will access and visualize data stored in two common formats: CSV and JSON:

  • Use a Python csvmodule to process weather data stored in CSV (comma-separated values) format to find the maximum and minimum temperatures over a period of time in two different regions;
  • Use the jsonmodule to access trade close data stored in JSON format.

The data in this article can be downloaded from the official website of the book ( http://www.ituring.com.cn/book/1861 ).

2. CSV file format

Create a new project, death_valley_2014.csvcopy the file to the project root directory, and create a new highs_lows.pyfile, change the program to read the temperature data of Death Valley, California in 2014, extract the daily maximum and minimum temperature, and draw a line graph:

import csv
from datetime import datetime
from matplotlib import pyplot as plt

filename = "death_valley_2014.csv"
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)

    dates, highs, lows = [], [], []
    for row in reader:
        try:
            current_date = datetime.strptime(row[0], "%Y-%m-%d")
            high = int(row[1])
            low = int(row[3])
        except ValueError:
            print(current_date, "missing data")
        else:
            dates.append(current_date)
            highs.append(high)
            lows.append(low)

fig = plt.figure(dpi=141, figsize=(10, 6))
# 绘制最高气温折线图
plt.plot(dates, highs, c="red")
# 绘制最低气温折线图
plt.plot(dates, lows, c="blue")
# 填充两个折现之间的空间,alpha为透明度,0为全透明,1为不透明
plt.fill_between(dates, highs, lows, facecolor="blue", alpha=0.1)
plt.title("Daily high and low temperatures - 2014\nDeath Valley, CA", fontsize=20)
plt.xlabel("", fontsize=16)
# 自动排版x轴的日期数据,避免重叠
fig.autofmt_xdate()
plt.ylabel("Temperature(F)", fontsize=16)
plt.tick_params(axis="both", which="major", labelsize=16)

plt.show()

The code now opens the file, then csv.reader()creates a CSV file reader through the function, the parameter is the file just opened; next()reads a line of the file through the function, and automatically converts the data into a list; and then forreads all the data through a loop. forError checking is also added to the loop in case the program terminates due to issues such as data loss in the file. We also fill_between()color the area between the two discounts by the function. The resulting image is as follows:
write picture description here

At the same time, we also get a message output:

2014-02-16 00:00:00 missing data

That is, the data for that day is lost.

3. Make a trading closing price chart: JSON format

It will now be btc_close_2017.jsoncopied to the project root directory. In this section, 5 images will be drawn: line chart of closing prices, logarithmic transformation of closing prices, monthly average of closing prices, weekly average of closing prices, and weekly average of closing prices. are used to Pygaldraw.

3.1 Draw the closing price line chart

import json
import pygal

# 将数据加载到一个列表中,列表中的元素是字典
filename = "btc_close_2017.json"
with open(filename) as f:
    btc_data = json.load(f)

dates, months, weeks, weekdays, close = [], [], [], [], []
for btc_dict in btc_data:
    dates.append(btc_dict["date"])
    months.append(int(btc_dict["month"]))
    weeks.append(int(btc_dict["week"]))
    weekdays.append(btc_dict["weekday"])
    close.append(int(float(btc_dict["close"])))

# x轴坐标上的刻度顺时针旋转20度
line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False)
line_chart.title = "收盘价(¥)"
line_chart.x_labels = dates
N = 20  # x轴坐标每隔20天显示一次
line_chart.x_labels_major = dates[::N]
line_chart.add("收盘价", close)
line_chart.render_to_file("收盘价折线图(¥).svg")

The resulting image is as follows:
write picture description here

3.2 Logarithmic transformation of closing price

As you can see from the chart above, the close is basically exponential, but there are some similar fluctuations (March, June, September). Although these fluctuations are masked by a growing trend, perhaps there is a cyclicality in them. To test the periodicity hypothesis, the nonlinear trend needs to be eliminated first. Logarithmic transformation is one of the commonly used processing methods. We use modules from the Python standard library mathto solve this problem.

-- snip --
import math

line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False)
line_chart.title = "收盘价对数变换(¥)"
line_chart.x_labels = dates
N = 20  # x轴坐标每隔20天显示一次
line_chart.x_labels_major = dates[::N]
# 对数变换
close_log = [math.log10(_) for _ in close]
line_chart.add("log收盘价", close_log)
line_chart.render_to_file("收盘价对数变换折线图(¥).svg")

Got the following image:
write picture description here

It can be seen that there were sharp fluctuations in March, June and September. Let's take a look at the monthly daily average and Sunday average of closing prices.

3.3 Average closing price

3.3.1 Monthly Daily Average

Before continuing with the new code, some additional knowledge is required: for zip()functions, it forms a new list from multiple lists according to the position of the elements, and the elements of the new list are tuples. as follows:

# 代码
a = [1, 2, 3]
b = [4, 5, 6]
c = [7, 8, 9, 10]
zipped_1 = zip(a,b)
zipped_2 = zip(a, b, c)
print(zipped_1)
print(list(zipped_1))
print(list(zipped_2))

# 结果
<zip object at 0x0000021D732DCDC8>
[(1, 4), (2, 5), (3, 6)]
[(1, 4, 7), (2, 5, 8), (3, 6, 9)]

In python2, zip()a list is returned directly, but in python3, zip()an iterable zipobject is returned, here we convert it to a list. zipAlso "unpack" (unpack) the object by preceding it with an asterisk :

# 代码:
print(*zipped_1)

# 结果:
(1, 4) (2, 5) (3, 6)

Asterisks can not only zipunpack objects, but also listunpack equivalent types.

We will also use groupby()the function, but before using the function, we need to sort the list. We use sorted()functions to sort. In python3, sorted()functions are compared in element order by default. For example, the elements of the list here are tuples, then sorted()compare the value of the first element in the tuple first, and then compare the value of the second element, as follows:

# 代码:
test = [(1, 5), (1, 4), (1, 3), (1, 2), (2, 3)]
print(sorted(test))

# 结果:
[(1, 2), (1, 3), (1, 4), (1, 5), (2, 3)]

This data is then grouped by a groupby()function, specified by a keyword argument key=itemgetter(0)to group by the first value of a list element (i.e. a tuple). It is also possible to itemgetter()replace the function here with an lambdaexpression, such as the equivalent lambdaexpression is lambda x: x[0]. In python3, groupby()return an iterable groupbyobject, if you convert it to list, listthe second value of each element in is also an iterable object:

# 代码:
test = [(1, 5), (1, 4), (1, 3), (1, 2), (2, 4), (2, 3), (3, 5)]
temp = groupby(sorted(test), key=itemgetter(0))
print(temp)
print(list(temp))
for a, b in temp:
    print(list(b))

# 结果:
<itertools.groupby object at 0x0000013CD9A4D458>
[(1, <itertools._grouper object at 0x0000013CE8AAE160>), 
 (2, <itertools._grouper object at 0x0000013CE8AAE128>), 
 (3, <itertools._grouper object at 0x0000013CE8AAE198>)]
[(1, 2), (1, 3), (1, 4), (1, 5)]
[(2, 3), (2, 4)]
[(3, 5)]

From forthe results of the above loop, the groupby()returned object can be regarded as a dictionary whose keys are the above keyvalues, and the values ​​of the dictionary are some of the elements in the list before grouping (may form a list, or may form tuples).

Now let's get down to business, back to the main thread.

Plot the daily average for the first 11 months of 2017, the daily average for the previous 49 weeks, and the daily average for each day of the week (Monday~Sunday). First we need to encapsulate some code:

from itertools import groupby
from operator import itemgetter

def draw_line(x_data, y_data, title, y_legend):
    xy_map = []
    # 本段见后面解释
    for x, y in groupby(sorted(zip(x_data, y_data)), key=itemgetter(0)):
        y_list = [v for _, v in y]
        xy_map.append([x, sum(y_list) / len(y_list)])
    x_unique, y_mean = [*zip(*xy_map)]
    line_chart = pygal.Line()
    line_chart.title = title
    line_chart.x_labels = x_unique
    line_chart.add(y_legend, y_mean)
    line_chart.render_to_file(title + ".svg")
    return line_chart

This code is a bit twisted. As can be seen from the previous introduction for, the variable in the loop is yequivalent to one list, and the element of this listis the first element of tuple, tuplethe first element of x_datawhich is the value in , and it is no longer necessary to repeat it, so it is composed of the second value list, that is, the 8th line of code. xy_mapis an listobject, and so are its elements list, that is, it is a two-dimensional array. Pay attention to the operation in line 10, which *xy_mapwill listbe unpacked. zip()The function will pack the unpacked elements into an zipobject again. If it is regarded as an listobject, the object contains two tupleelements, and then the zipobject is also unpacked, and the outermost Set another layer listto get one with two tupleelements, listand finally assign them in parallel. In order to more specifically reflect this operation, the following is simulated with some simple data:

# 代码:
temp = [[1, 2], [3, 4], [5, 6]]
x, y = [*zip(*temp)]
print(x)
print(y)

# 结果:
(1, 3, 5)
(2, 4, 6)

Finally, it's finally time to draw:

-- 读取文件内容的代码和前面一样 --
idx_month = dates.index("2017-12-01")
line_chart_month = draw_line(months[:idx_month], close[:idx_month],
                             "收盘价月日均值(¥)", "月日均值")

The result obtained is as follows:
write picture description here

3.3.2 Weekly Average

The first week of 2017 starts on January 2, 2017, and the 49th Sunday is December 10, 2017.

-- 读取文件内容的代码和前面一样 --
idx_week = dates.index("2017-12-11")
line_chart_week = draw_line(weeks[1:idx_week], close[1:idx_week], "收盘价周日均值(¥)", "周日均值")

The result is as follows:
write picture description here

3.3.3 Average value for each day of the week

If you directly use weekdaysthis list to generate a chart, since the list stores strings and is sorted by ASCIIcode, the week order of the last generated chart will be wrong, so it is converted into a number.

idx_week = dates.index("2017-12-11")
wd = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday",
      "Sunday"]
weekdays_int = [wd.index(w) + 1 for w in weekdays[1:idx_week]]
line_chart_weekday = draw_line(weekdays_int, close[1:idx_week], "收盘价星期均值(¥)", "星期均值")
line_chart_weekday.x_labels = ["周一", "周二", "周三", "周四", "周五", "周六", "周日"]
line_chart_weekday.render_to_file("收盘价星期均值(¥).svg")

The final result is as follows:
write picture description here

3.4 Closing Price Data Dashboard

Finally, we integrated the five tables into one file to make a dashboard:

with open('收盘价Dashboard.html', 'w', encoding='utf8') as html_file:
    title = '<html><head><title>收盘价Dashboard</title><meta charset="utf-8"></head><body>\n'
    html_file.write(title)
    for svg in [
        '收盘价折线图(¥).svg', '收盘价对数变换折线图(¥).svg', '收盘价月日均值(¥).svg',
        '收盘价周日均值(¥).svg', '收盘价星期均值(¥).svg'
    ]:
        html_file.write(
            '    <object type="image/svg+xml" data="{0}" height=500></object>\n'.format(svg))
    html_file.write('</body></html>')

The effect is as follows:
write picture description here

This is the effect of enlarging the browser. If the default is 100%, these five pictures are all on the same line and are very small.

4. Summary

The main contents of this article are:

  • How to use datasets on the web;
  • How to process CSV and JSON files, and how to extract the data you are interested in;
  • How to use matplotlibto process past weather data, including how to use datetimemodules, and how to plot multiple data series in the same chart;
  • How jsonmodules to access trade closing price data stored in JSON format and use Pygalplot graphs to explore the cyclicality of price changes, and how to combine Pygalgraphs into data dashboards.

The next article will collect data from the web and visualize it.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325657741&siteId=291194637