How to use Python to automatically generate rainfall statistical analysis reports based on data

First look at the requirements:

 

 

The main thing is to automatically generate the Word statistical report on the right according to the table on the left. The actual possibilities are far more complicated than those shown in the figure.
Okay, let’s just start doing the code!
1. Data reading

import pandas as pd

df = pd.read_csv("11月份数据.csv", encoding='gbk')
# 当前统计月份
month = 11
df = df.query('月份==@month')
df.head(10)

Preview data:

 

 

2. Abnormal data filtering
View the number of missing values:

pd.isnull(df).sum()

result:

区域          0
月份          0
降雨量(mm)     0
降雨距平(mm)    1
观测站         0
dtype: int64

Only one missing value data can be deleted directly:

df.dropna(inplace=True)

3. Calculate the change in
rainfall at the observation station relative to previous years. Calculate the number of times the rainfall is higher than previous years, unchanged from previous years, and lower than previous years:

rainfall_high = df.eval('`降雨距平(mm)` > 0').value_counts().get(True, 0)
rainfall_equal = df.eval('`降雨距平(mm)` == 0').value_counts().get(True, 0)
rainfall_low = df.eval('`降雨距平(mm)` < 0').value_counts().get(True, 0)
print(rainfall_high, rainfall_equal, rainfall_low)

13 1 18

In the above results, rainfall_high represents the number of times the rainfall is higher than the average level of previous years, rainfall_equal represents the number of times the rainfall is equal to the average level of previous years, and rainfall_low represents the number of times the rainfall is lower than the average level of previous years.
Therefore, the first paragraph of the report is generated according to the situation:

p1 = f"{month}月份"
if rainfall_low == 0 or rainfall_high == 0:
    if rainfall_equal != 0:
        p1 += f"除{rainfall_equal}个观测站降雨量较往年无变化外,"
    if rainfall_high == 0:
        p1 += f"各气象观测站降雨量较往年均偏低。"
    elif rainfall_low == 0:
        p1 += f"各气象观测站降雨量较往年均偏高。"
else:
    #  10%以内差异认为是持平
    if rainfall_high > rainfall_low*1.1:
        p1 += f"大部分气象观测站降雨量较往年偏高。"
    elif rainfall_low > rainfall_high*1.1:
        p1 += f"大部分气象观测站降雨量较往年偏低。"
    else:
        p1 += f"各气象观测站降雨量较往年整体持平。"
p1

result:

'11月份大部分气象观测站降雨量较往年偏低。'

4. Calculate the extreme value of rainfall in each region
and generate the second paragraph of the report:

p2 = ""
t = df['降雨量(mm)']
p2 += f"各区域降雨量在{t.min()}~{t.max()}mm之间,其中{df.loc[t.argmax(), '区域']}区域的降雨量最大,为{t.max()}mm。"
p2

result:

'各区域降雨量在0.0~16.0mm之间,其中51a45区域的降雨量最大,为16.0mm。'

5. Statistics by sub-observation station The part
that makes my head hurt is from the code here, and there are more complicated requirements later that will not be announced.
For each observation station, count which areas are high, which areas are flat, and which areas are low:

p3s = []
for station, tmp in df.groupby('观测站'):
    t = tmp['降雨量(mm)']
    p3 = f"各区域降雨量在{t.min()}~{t.max()}mm之间,"
    rainfall_high_mask = tmp.eval('`降雨距平(mm)` > 0')
    rainfall_equal_mask = tmp.eval('`降雨距平(mm)` == 0')
    rainfall_low_mask = tmp.eval('`降雨距平(mm)` < 0')

    rainfall_high = rainfall_high_mask.value_counts().get(True, 0)
    rainfall_equal = rainfall_equal_mask.value_counts().get(True, 0)
    rainfall_low = rainfall_low_mask.value_counts().get(True, 0)
#     print(rainfall_high, rainfall_equal, rainfall_low)

    if rainfall_low == 0 or rainfall_high == 0:
        if rainfall_equal != 0:
            p3 += '除'
            p3 += '、'.join(tmp.loc[rainfall_equal_mask, '区域']+'区域')
            p3 += "降雨量较往年无变化外,"
        if rainfall_high == 0:
            p3 += f"各区域降雨量均较往年偏低"
        elif rainfall_low == 0:
            p3 += f"各区域降雨量均较往年偏高"
        t = tmp['降雨距平(mm)'].abs()
        p3 += f"{t.min()}~{t.max()}mm;"
    else:
        if rainfall_equal != 0:
            p3 += '除'
            p3 += '、'.join(tmp.loc[rainfall_equal_mask, '区域']+'区域')
            p3 += "降雨量较往年无变化,"
        #  10%以内差异认为是持平
        if rainfall_high > rainfall_low*1.1:
            if rainfall_equal == 0:
                p3 += '除'
            p3 += '、'.join(tmp.loc[rainfall_low_mask, '区域']+'区域')
            p3 += "降雨量较往年偏低"
            t = tmp.loc[rainfall_low_mask, '降雨距平(mm)'].abs()
            if t.shape[0] > 1:
                p3 += f"{t.min()}~{t.max()}mm"
            else:
                p3 += f"{t.min()}mm"
            p3 += "外,"
            t = tmp.loc[rainfall_high_mask, '降雨距平(mm)'].abs()
            p3 += f"其余各区域降雨量较往年偏高{t.min()}~{t.max()}mm;"
        elif rainfall_low > rainfall_high*1.1:
            if rainfall_equal == 0:
                p3 += '除'
            p3 += '、'.join(tmp.loc[rainfall_high_mask, '区域']+'区域')
            p3 += "降雨量较往年偏高"
            t = tmp.loc[rainfall_high_mask, '降雨距平(mm)'].abs()
            if t.shape[0] > 1:
                p3 += f"{t.min()}~{t.max()}mm"
            else:
                p3 += f"{t.min()}mm"
            p3 += "外,"
            t = tmp.loc[rainfall_low_mask, '降雨距平(mm)'].abs()
            p3 += f"其余各区域降雨量较往年偏低{t.min()}~{t.max()}mm;"
        else:
            if rainfall_equal != 0:
                p3 = p3[:-1]+'外,'
            p3 += f"各区域降雨量较往年偏高和偏低的数量持平,其中"
            p3 += '、'.join(tmp.loc[rainfall_low_mask, '区域']+'区域')
            p3 += "降雨量较往年偏低"
            t = tmp.loc[rainfall_low_mask, '降雨距平(mm)'].abs()
            if t.shape[0] > 1:
                p3 += f"{t.min()}~{t.max()}mm,"
            else:
                p3 += f"{t.min()}mm,"
            p3 += '、'.join(tmp.loc[rainfall_high_mask, '区域']+'区域')
            p3 += "降雨量较往年偏高"
            t = tmp.loc[rainfall_high_mask, '降雨距平(mm)'].abs()
            if t.shape[0] > 1:
                p3 += f"{t.min()}~{t.max()}mm;"
            else:
                p3 += f"{t.min()}mm;"
    p3s.append([station, p3])
p3s[-1][-1] = p3s[-1][-1][:-1]+"。"
p3s

6. Write the organized text into
the content of the Word template file docxtemplate.docx in Word:

一、{
   
   { month }}月各气象观测站降雨量实况
(一)降水
{
   
   { p1 }}
{
   
   { p2 }}
{%p for station,p3 in p3s %}
{
   
   { station }}:{
   
   { p3 }}
{%p endfor %}

which is:

 

 

Python rendering code:

from docxtpl import DocxTemplate

tpl = DocxTemplate("docxtemplate.docx")
context = {
    'month': month,
    'p1': p1,
    'p2': p2,
    'p3s': p3s,
}
tpl.render(context)
tpl.save("11月降雨量报告.docx")

After the execution is completed, the Word statistical analysis report is obtained:

 

Guess you like

Origin blog.csdn.net/pythonxuexi123/article/details/112796299