During the Ching Ming Festival, there are many rains, and the pedestrians on the road want to die
Ching Ming Festival, also known as Outing Qing Festival, Xingqing Festival, March Festival, Ancestor Worship Festival, etc., is held at the turn of mid-spring and late spring. Tomb-sweeping Day originated from the ancestor beliefs of early humans and the etiquette and customs of spring sacrifices. It is the most solemn and grand ancestor worship festival of the Chinese nation. Tomb-sweeping Festival has two connotations of nature and humanities. It is not only a natural solar term, but also a traditional festival. Tomb-sweeping and ancestor worship and outings are the two major etiquette themes of Qingming Festival. These two traditional etiquette themes have been passed down in China since ancient times and continue to this day.
Ching Ming Festival special topic
-
- During the Ching Ming Festival, there are many rains, and the pedestrians on the road want to die
-
- 1. Data overview
- 2. Data preprocessing
- 3. Data visualization
-
- 0. Import packages and data
- 1. Mortality percentage by age range
- 2. Occupations with the top 20 fatalities
- 3. The top 10 causes of death
- 4. The birth years of the top 20 deaths
- 5. The year of death of the top 20 deaths
- 6. Trends in age of death by sex
- 7. The number of men and women in the top 10 occupations
1. Data overview
The dataset contains structured information on the lives, work, and deaths of more than 1 million deceased individuals .
Dataset: AgeDatasetV1.csv
has a total of 1,223,009 pieces of data.
Through the death data of 1.22 million people around the world,
we can understand the life expectancy of most people,
which years have more deaths in the past,
which years have more deaths,
as well as the trend of death age by gender,
men by different occupations and female deaths.
In this case, we use pandas, pyplot, and seaborn to draw pie charts,
bar charts, stacked bar charts, and line charts.
To obtain data sets and source codes, you can add vx: python10010
column name | describe |
---|---|
‘Id’, | serial number |
‘Name’, | name |
‘Short description’, | brief description |
Gender’, | gender |
Country’, | nation |
Occupation’, | Profession |
‘Birth year’, | year of birth |
‘Death year’, | year of death |
‘Manner of death’, | mode of death |
‘Age of death’ | age at death |
2. Data preprocessing
It is found that the year of birth is a negative number,
which is actually a normal value, and
a negative value indicates BC.
import pandas as pd
df = pd.read_csv('.\data\AgeDatasetV1.csv')
df.info()
df.describe().to_excel(r'.\result\describe.xlsx')
df.isnull().sum().to_excel(r'.\result\nullsum.xlsx')
df[df.duplicated()].to_excel(r'.\result\duplicated.xlsx')
df.rename(columns=lambda x: x.replace(' ', '_').replace('-', '_'), inplace=True)
print(df.columns)
print(df[df['Birth_year'] < 0].to_excel(r'.\result\biryear0.xlsx'))
3. Data visualization
0. Import packages and data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
df1 = pd.read_csv('./data/AgeDatasetV1.csv')
df1.rename(columns=lambda x: x.replace(' ', '_').replace('-', '_'), inplace=True)
print(df1.columns)
1. Mortality percentage by age range
plt.figure(figsize=(12, 10))
count = [df1[df1['Age_of_death'] > 100].shape[0],
df1.shape[0] - df1[(df1['Age_of_death'] <= 100) & (df1['Age_of_death'] > 90)].shape[0],
df1[(df1['Age_of_death'] <= 90) & (df1['Age_of_death'] > 70)].shape[0],
df1[(df1['Age_of_death'] <= 70) & (df1['Age_of_death'] > 50)].shape[0],
df1.shape[0] - (df1[df1['Age_of_death'] > 100].shape[0] +
df1[(df1['Age_of_death'] <= 100) & (df1['Age_of_death'] > 90)].shape[0]
+ df1[(df1['Age_of_death'] <= 70) & (df1['Age_of_death'] > 50)].shape[0]
+ df1[(df1['Age_of_death'] <= 90) & (df1['Age_of_death'] > 70)].shape[0])
]
age = ['> 100', '> 90 & <= 100', '> 70 & <= 90', '> 50 & <= 70', '< 50']
explode = [0.1, 0, 0.02, 0, 0] # 设置各部分突出
palette_color = sns.color_palette('pastel')
plt.rc('font', family='SimHei', size=16)
plt.pie(count, labels=age, colors=palette_color,
explode=explode, autopct='%.4f%%')
plt.title("按不同年龄范围的死亡率百分比")
plt.savefig(r'.\result\不同年龄范围的死亡率百分比.png')
plt.show()
2. Occupations with the top 20 fatalities
Occupation = list(df1['Occupation'].value_counts()[:20].keys())
Occupation_count = list(df1['Occupation'].value_counts()[:20].values)
plt.rc('font', family='SimHei', size=16)
plt.figure(figsize=(14, 8))
# sns.set_theme(style="darkgrid")
p = sns.barplot(x=Occupation_count, y=Occupation) # 长条图
p.set_xlabel("人数", fontsize=20)
p.set_ylabel("职业", fontsize=20)
plt.title("前20的职业", fontsize=20)
plt.subplots_adjust(left=0.18)
plt.savefig(r'.\result\死亡人数前20的职业.png')
plt.show()
3. The top 10 causes of death
top_causes = df1.groupby('Manner_of_death').size().reset_index(name='count')
top_causes = top_causes.sort_values(by='count', ascending=False).iloc[:10]
fig = plt.figure(figsize=(10, 6))
plt.barh(top_causes['Manner_of_death'], top_causes['count'], edgecolor='black') # 堆积条形图
plt.title('死亡人数前10的死因.png')
plt.xlabel('人数')
plt.ylabel('死因')
plt.tight_layout() # 自动调整子图参数,使之填充整个图像区域
plt.savefig(r'.\result\死亡人数前10的死因.png')
plt.show()
4. The birth years of the top 20 deaths
birth_year = df1.groupby('Birth_year').size().reset_index(name='count')
birth_year = birth_year.sort_values(by='count', ascending=False).iloc[:20]
fig = plt.figure(figsize=(10, 6))
plt.barh(birth_year['Birth_year'], birth_year['count'])
plt.title('死亡人数前20的出生年份')
plt.xlabel('人数')
plt.ylabel('出生年份')
5. The year of death of the top 20 deaths
death_year = df1.groupby('Death_year').size().reset_index(name='count')
print(death_year)
death_year = death_year.sort_values(by='count', ascending=False).iloc[:20]
fig = plt.figure(figsize=(10, 10))
plt.barh(death_year['Death_year'], death_year['count'])
plt.title('死亡人数前20的去世年份')
plt.xlabel('人数')
plt.ylabel('去世年份')
plt.tight_layout()
plt.savefig(r'.\result\死亡人数前20的去世年份.png')
6. Trends in age of death by sex
data = pd.DataFrame(
df1.groupby(['Gender', 'Age_of_death']).size().reset_index(name='count').sort_values(by='count', ascending=False))
fig = plt.figure(figsize=(10, 10))
sns.lineplot(data=data, x='Age_of_death', y='count', hue='Gender', linewidth=2) # 折线图
plt.legend(fontsize=8)
plt.title('按性别分列的死亡年龄趋势')
plt.xlabel('死亡年龄')
plt.ylabel('人数')
plt.tight_layout()
plt.savefig(r'.\result\死亡人数前20的去世年份.png')
plt.show()
Image transfer failed
7. The number of men and women in the top 10 occupations
occupation = pd.DataFrame(df1['Occupation'].value_counts())
top10_occupation = occupation.head(10)
top_index = [i for i in top10_occupation.index]
age_data = df1[df1['Occupation'].isin(top_index)]
age_data = age_data[age_data['Gender'].isin(['Male', 'Female'])]
sns.catplot(data=age_data, x='Occupation', kind='count', hue='Gender', height=10)
plt.xticks(rotation=20)
plt.xlabel('职业')
plt.ylabel('人数')
plt.tight_layout()
plt.savefig(r'.\result\前10的职业中男女性人数.png')
plt.show()