Tomb-sweeping Day uses python to count the number of deaths and draw various graphs. Will there be a programmer?

During the Ching Ming Festival, there are many rains, and the pedestrians on the road want to die

Ching Ming Festival, also known as Outing Qing Festival, Xingqing Festival, March Festival, Ancestor Worship Festival, etc., is held at the turn of mid-spring and late spring. Tomb-sweeping Day originated from the ancestor beliefs of early humans and the etiquette and customs of spring sacrifices. It is the most solemn and grand ancestor worship festival of the Chinese nation. Tomb-sweeping Festival has two connotations of nature and humanities. It is not only a natural solar term, but also a traditional festival. Tomb-sweeping and ancestor worship and outings are the two major etiquette themes of Qingming Festival. These two traditional etiquette themes have been passed down in China since ancient times and continue to this day.

insert image description here


1. Data overview


The dataset contains structured information on the lives, work, and deaths of more than 1 million deceased individuals .

Dataset: AgeDatasetV1.csv
has a total of 1,223,009 pieces of data.

Through the death data of 1.22 million people around the world,
we can understand the life expectancy of most people,
which years have more deaths in the past,
which years have more deaths,
as well as the trend of death age by gender,
men by different occupations and female deaths.

In this case, we use pandas, pyplot, and seaborn to draw pie charts,
bar charts, stacked bar charts, and line charts.
To obtain data sets and source codes, you can add vx: python10010

column name describe
‘Id’, serial number
‘Name’, name
‘Short description’, brief description
Gender’, gender
Country’, nation
Occupation’, Profession
‘Birth year’, year of birth
‘Death year’, year of death
‘Manner of death’, mode of death
‘Age of death’ age at death

insert image description here


2. Data preprocessing

It is found that the year of birth is a negative number,
which is actually a normal value, and
a negative value indicates BC.

import pandas as pd

df = pd.read_csv('.\data\AgeDatasetV1.csv')
df.info()

df.describe().to_excel(r'.\result\describe.xlsx')
df.isnull().sum().to_excel(r'.\result\nullsum.xlsx')
df[df.duplicated()].to_excel(r'.\result\duplicated.xlsx')

df.rename(columns=lambda x: x.replace(' ', '_').replace('-', '_'), inplace=True)
print(df.columns)
print(df[df['Birth_year'] < 0].to_excel(r'.\result\biryear0.xlsx'))

insert image description here

3. Data visualization

0. Import packages and data

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['font.sans-serif'] = ['SimHei'] 
plt.rcParams['axes.unicode_minus'] = False

df1 = pd.read_csv('./data/AgeDatasetV1.csv')
df1.rename(columns=lambda x: x.replace(' ', '_').replace('-', '_'), inplace=True)
print(df1.columns)

1. Mortality percentage by age range

plt.figure(figsize=(12, 10))
count = [df1[df1['Age_of_death'] > 100].shape[0],
         df1.shape[0] - df1[(df1['Age_of_death'] <= 100) & (df1['Age_of_death'] > 90)].shape[0],
         df1[(df1['Age_of_death'] <= 90) & (df1['Age_of_death'] > 70)].shape[0],
         df1[(df1['Age_of_death'] <= 70) & (df1['Age_of_death'] > 50)].shape[0],
         df1.shape[0] - (df1[df1['Age_of_death'] > 100].shape[0] +
                         df1[(df1['Age_of_death'] <= 100) & (df1['Age_of_death'] > 90)].shape[0]
                         + df1[(df1['Age_of_death'] <= 70) & (df1['Age_of_death'] > 50)].shape[0]
                         + df1[(df1['Age_of_death'] <= 90) & (df1['Age_of_death'] > 70)].shape[0])
         ]

age = ['> 100', '> 90 & <= 100', '> 70 & <= 90', '> 50 & <= 70', '< 50']
explode = [0.1, 0, 0.02, 0, 0]  # 设置各部分突出

palette_color = sns.color_palette('pastel')

plt.rc('font', family='SimHei', size=16)
plt.pie(count, labels=age, colors=palette_color,
        explode=explode, autopct='%.4f%%')
plt.title("按不同年龄范围的死亡率百分比")
plt.savefig(r'.\result\不同年龄范围的死亡率百分比.png')
plt.show()

Please add a picture description

2. Occupations with the top 20 fatalities

Occupation = list(df1['Occupation'].value_counts()[:20].keys())
Occupation_count = list(df1['Occupation'].value_counts()[:20].values)
plt.rc('font', family='SimHei', size=16)
plt.figure(figsize=(14, 8))
# sns.set_theme(style="darkgrid")
p = sns.barplot(x=Occupation_count, y=Occupation)  # 长条图
p.set_xlabel("人数", fontsize=20)
p.set_ylabel("职业", fontsize=20)
plt.title("前20的职业", fontsize=20)
plt.subplots_adjust(left=0.18)
plt.savefig(r'.\result\死亡人数前20的职业.png')
plt.show()

Please add a picture description

3. The top 10 causes of death

top_causes = df1.groupby('Manner_of_death').size().reset_index(name='count')
top_causes = top_causes.sort_values(by='count', ascending=False).iloc[:10]
fig = plt.figure(figsize=(10, 6))
plt.barh(top_causes['Manner_of_death'], top_causes['count'], edgecolor='black') # 堆积条形图
plt.title('死亡人数前10的死因.png')
plt.xlabel('人数')
plt.ylabel('死因')
plt.tight_layout()  # 自动调整子图参数,使之填充整个图像区域
plt.savefig(r'.\result\死亡人数前10的死因.png')
plt.show()

Please add a picture description

4. The birth years of the top 20 deaths

birth_year = df1.groupby('Birth_year').size().reset_index(name='count')
birth_year = birth_year.sort_values(by='count', ascending=False).iloc[:20]
fig = plt.figure(figsize=(10, 6))
plt.barh(birth_year['Birth_year'], birth_year['count'])
plt.title('死亡人数前20的出生年份')
plt.xlabel('人数')
plt.ylabel('出生年份')

Please add a picture description

5. The year of death of the top 20 deaths

death_year = df1.groupby('Death_year').size().reset_index(name='count')
print(death_year)
death_year = death_year.sort_values(by='count', ascending=False).iloc[:20]
fig = plt.figure(figsize=(10, 10))
plt.barh(death_year['Death_year'], death_year['count'])
plt.title('死亡人数前20的去世年份')
plt.xlabel('人数')
plt.ylabel('去世年份')
plt.tight_layout()
plt.savefig(r'.\result\死亡人数前20的去世年份.png')

Please add a picture description

6. Trends in age of death by sex

data = pd.DataFrame(
    df1.groupby(['Gender', 'Age_of_death']).size().reset_index(name='count').sort_values(by='count', ascending=False))

fig = plt.figure(figsize=(10, 10))
sns.lineplot(data=data, x='Age_of_death', y='count', hue='Gender', linewidth=2) # 折线图
plt.legend(fontsize=8)
plt.title('按性别分列的死亡年龄趋势')
plt.xlabel('死亡年龄')
plt.ylabel('人数')
plt.tight_layout()
plt.savefig(r'.\result\死亡人数前20的去世年份.png')
plt.show()

Image transfer failed

7. The number of men and women in the top 10 occupations

occupation = pd.DataFrame(df1['Occupation'].value_counts())
top10_occupation = occupation.head(10)
top_index = [i for i in top10_occupation.index]
age_data = df1[df1['Occupation'].isin(top_index)]
age_data = age_data[age_data['Gender'].isin(['Male', 'Female'])]
sns.catplot(data=age_data, x='Occupation', kind='count', hue='Gender', height=10)
plt.xticks(rotation=20)
plt.xlabel('职业')
plt.ylabel('人数')
plt.tight_layout()
plt.savefig(r'.\result\前10的职业中男女性人数.png')
plt.show()

Please add a picture description
insert image description here

Guess you like

Origin blog.csdn.net/m0_74872863/article/details/129964092