Python data analysis combat: rainfall statistical analysis report analysis

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.

The following article is from Cai J Learn Python, the author Xiao Xiaoming

I recently encountered a need for a bit of a brain burn, but it is actually not a python basic tutorial brain burn . The main reason is that there are too many judgment conditions. For people with poor memory and small memory, it is easy to have memory overflow and cause brain downtime. It may also be because I haven't found a way to reduce the pressure on brain memory.

First look at the requirements:
Insert picture description here

Python data analysis combat: rainfall statistical analysis report analysis

The main thing is to automatically generate the Word statistical report on the right according to the table on the left. The actual possibilities are far more complicated than those shown in the figure.

Okay, let's just start coding!

1 data read

import pandas as pd

df = pd.read_csv("11月份数据.csv", encoding='gbk')
# 当前统计月份
month = 11
df = df.query('月份==@month')
df.head(10)

Preview data:

Python data analysis combat: rainfall statistical analysis report analysis

Insert picture description here

2 Abnormal data filtering

View the number of missing values:

pd.isnull(df).sum()

result:

区域          0
月份          0
降雨量(mm)     0
降雨距平(mm)    1
观测站         0
dtype: int64

Only one missing value data can be deleted directly:

df.dropna(inplace=True)

3 Calculate the change of rainfall at the observation station relative to previous years

Calculate the number of times the rainfall is higher than previous years, unchanged from previous years, and lower than previous years:

rainfall_high = df.eval('`降雨距平(mm)` > 0').value_counts().get(True, 0)
rainfall_equal = df.eval('`降雨距平(mm)` == 0').value_counts().get(True, 0)
rainfall_low = df.eval('`降雨距平(mm)` < 0').value_counts().get(True, 0)
print(rainfall_high, rainfall_equal, rainfall_low)

13 1 18

In the above results, rainfall_high represents the number of times the rainfall is higher than the average level of previous years, rainfall_equal represents the number of times the rainfall is equal to the average level of previous years, and rainfall_low represents the number of times the rainfall is lower than the average level of previous years.

Therefore, the first paragraph of the report is generated according to the situation:

p1 = f"{month}月份"
if rainfall_low == 0 or rainfall_high == 0:
    if rainfall_equal != 0:
        p1 += f"除{rainfall_equal}个观测站降雨量较往年无变化外,"
    if rainfall_high == 0:
        p1 += f"各气象观测站降雨量较往年均偏低。"
    elif rainfall_low == 0:
        p1 += f"各气象观测站降雨量较往年均偏高。"
else:
    #  10%以内差异认为是持平
    if rainfall_high > rainfall_low*1.1:
        p1 += f"大部分气象观测站降雨量较往年偏高。"
    elif rainfall_low > rainfall_high*1.1:
        p1 += f"大部分气象观测站降雨量较往年偏低。"
    else:
        p1 += f"各气象观测站降雨量较往年整体持平。"
p1

result:

The rainfall in most meteorological observatories in November was lower than in previous years. '

4 Calculate the extreme value of rainfall in each region

Then generate the second paragraph of the report:

p2 = ""
t = df['rainfall(mm)']
p2 += f"The rainfall in each area is between {t.min()}~{t.max()}mm, where {df.loc [t.argmax(),'area']} area has the largest rainfall, {t.max()}mm."
p2

result:

'The rainfall in each area is between 0.0 and 16.0mm, of which the 51a45 area has the highest rainfall at 16.0mm. '

5 points of observation station statistics

The part that makes my head hurt is from the code here, and there are more complicated requirements that will not be announced.

For each observation station, count which areas are high, which areas are flat, and which areas are low:

p3s = []
for station, tmp in df.groupby('观测站'):
    t = tmp['降雨量(mm)']
    p3 = f"各区域降雨量在{t.min()}~{t.max()}mm之间,"
    rainfall_high_mask = tmp.eval('`降雨距平(mm)` > 0')
    rainfall_equal_mask = tmp.eval('`降雨距平(mm)` == 0')
    rainfall_low_mask = tmp.eval('`降雨距平(mm)` < 0')

    rainfall_high = rainfall_high_mask.value_counts().get(True, 0)
    rainfall_equal = rainfall_equal_mask.value_counts().get(True, 0)
    rainfall_low = rainfall_low_mask.value_counts().get(True, 0)
#     print(rainfall_high, rainfall_equal, rainfall_low)

    if rainfall_low == 0 or rainfall_high == 0:
        if rainfall_equal != 0:
            p3 += '除'
            p3 += '、'.join(tmp.loc[rainfall_equal_mask, '区域']+'区域')
            p3 += "降雨量较往年无变化外,"
        if rainfall_high == 0:
            p3 += f"各区域降雨量均较往年偏低"
        elif rainfall_low == 0:
            p3 += f"各区域降雨量均较往年偏高"
        t = tmp['降雨距平(mm)'].abs()
        p3 += f"{t.min()}~{t.max()}mm;"
    else:
        if rainfall_equal != 0:
            p3 += '除'
            p3 += '、'.join(tmp.loc[rainfall_equal_mask, '区域']+'区域')
            p3 += "降雨量较往年无变化,"
        #  10%以内差异认为是持平
        if rainfall_high > rainfall_low*1.1:
            if rainfall_equal == 0:
                p3 += '除'
            p3 += '、'.join(tmp.loc[rainfall_low_mask, '区域']+'区域')
            p3 += "降雨量较往年偏低"
            t = tmp.loc[rainfall_low_mask, '降雨距平(mm)'].abs()
            if t.shape[0] > 1:
                p3 += f"{t.min()}~{t.max()}mm"
            else:
                p3 += f"{t.min()}mm"
            p3 += "外,"
            t = tmp.loc[rainfall_high_mask, '降雨距平(mm)'].abs()
            p3 += f"其余各区域降雨量较往年偏高{t.min()}~{t.max()}mm;"
        elif rainfall_low > rainfall_high*1.1:
            if rainfall_equal == 0:
                p3 += '除'
            p3 += '、'.join(tmp.loc[rainfall_high_mask, '区域']+'区域')
            p3 += "降雨量较往年偏高"
            t = tmp.loc[rainfall_high_mask, '降雨距平(mm)'].abs()
            if t.shape[0] > 1:
                p3 += f"{t.min()}~{t.max()}mm"
            else:
                p3 += f"{t.min()}mm"
            p3 += "外,"
            t = tmp.loc[rainfall_low_mask, '降雨距平(mm)'].abs()
            p3 += f"其余各区域降雨量较往年偏低{t.min()}~{t.max()}mm;"
        else:
            if rainfall_equal != 0:
                p3 = p3[:-1]+'外,'
            p3 += f"各区域降雨量较往年偏高和偏低的数量持平,其中"
            p3 += '、'.join(tmp.loc[rainfall_low_mask, '区域']+'区域')
            p3 += "降雨量较往年偏低"
            t = tmp.loc[rainfall_low_mask, '降雨距平(mm)'].abs()
            if t.shape[0] > 1:
                p3 += f"{t.min()}~{t.max()}mm,"
            else:
                p3 += f"{t.min()}mm,"
            p3 += '、'.join(tmp.loc[rainfall_high_mask, '区域']+'区域')
            p3 += "降雨量较往年偏高"
            t = tmp.loc[rainfall_high_mask, '降雨距平(mm)'].abs()
            if t.shape[0] > 1:
                p3 += f"{t.min()}~{t.max()}mm;"
            else:
                p3 += f"{t.min()}mm;"
    p3s.append([station, p3])
p3s[-1][-1] = p3s[-1][-1][:-1]+"。"
p3s

It may be that I haven't come up with a better packaging method, which has caused the code to become complicated in this c# tutorial . If there are friends who can solve this problem ingeniously, I hope to join the J Learn Python exchange group to discuss.

6 Write the organized text into word

The content of the Word template file docxtemplate.docx:

1. { {month }} The actual rainfall of each meteorological observation station in the month
(1) Precipitation
{ {p1 }}
{ {p2 }}
{%p for station,p3 in p3s %}
{ {station }}:{ {p3} }
{%p endfor %}

which is:

Python data analysis combat: rainfall statistical analysis report analysis
Insert picture description here

Python rendering code:

from docxtpl import DocxTemplate

tpl = DocxTemplate("docxtemplate.docx")
context = {
    
    
    'month': month,
    'p1': p1,
    'p2': p2,
    'p3s': p3s,
}
tpl.render(context)
tpl.save("11月降雨量报告.docx")

After the execution is completed, the Word statistical analysis report is obtained:
Insert picture description here

Python data analysis combat: rainfall statistics vb.net tutorial
analysis report analysis

Guess you like

Origin blog.csdn.net/chinaherolts2008/article/details/112912561