The R language ggplot2 draws a pyramid diagram to show the population structure

The main content of today's tweet comes from the link https://ikashnitsky.github.io/2017/who-is-old/. I planned to repeat this tutorial completely, but the data in the tutorial is no longer available for download, so I found the population data of Sweden according to the data source in the tutorial. You can download this dataset by yourself. The download link is https://www .mortality.org/ , this requires a simple registration, of course, you can also leave a message in the background of the official account 20210826to get the sample data and code of today's tweet (the message needs to match exactly, and there can be no spaces at the beginning and end, I don't know why the previous messages often appear spaces appear).

Some sample data is as follows

image.png
  • The first column is the year (the year is 1751 to 2020)
  • The second column is age
  • The third column is the number of girls
  • The fourth column is the number of boys
  • The fifth column is the total

First, we select the data in 1751 to make a pyramid diagram

Load the drawing package

library(tidyverse)
library(ggplot2)
library(see)

Read the data; pick the data for 1751; and calculate the proportion of the total population by age and gender and convert the wide format data to long format data

df <- read_tsv("20210826/20210826-1.txt")
#write_tsv(df,file = "20210826/20210826-1.txt")
head(df)

df %>% filter(Year == 1751) %>% 
  mutate(FP = Female/sum(Total),
         MP = - Male/sum(Total)) %>% 
  select(Age,FP,MP) %>% 
  reshape2::melt(id.vars="Age",
                 variable.name="Sex",
                 value.name = "prop") -> df1

drawing

df1 %>% 
  ggplot()+
  geom_col(aes(x=Age,y=prop,fill=Sex),
           width = 1)+
  scale_y_continuous(breaks = c(-.01, 0, .01), 
                     labels = c(.01, 0, .01),
                     limits = c(-.02, .02), 
                     expand = c(0,0))+
  theme_minimal()+
  theme(legend.position = "none",
        panel.grid = element_blank(),
        plot.title = element_text(hjust = 0.5, size = 20))+
  coord_flip()+
  labs(x=NULL,y=NULL,title = "1751")+
  annotate(geom="text",
           y=-0.01,x=100,
           label="Male")+
  annotate(geom="text",
           y=0.01,x=100,
           label="Female")+
  scale_fill_material_d()
image.png

We see that this population structure is still in line with the structure of the pyramid.

Let's take a look at the data for 2020

df %>% filter(Year == 2020) %>% 
  mutate(FP = Female/sum(Total),
         MP = - Male/sum(Total)) %>% 
  select(Age,FP,MP) %>% 
  reshape2::melt(id.vars="Age",
                 variable.name="Sex",
                 value.name = "prop") %>% 
  ggplot()+
  geom_col(aes(x=Age,y=prop,fill=Sex),
           width = 1)+
  scale_y_continuous(breaks = c(-.01, 0, .01), 
                     labels = c(.01, 0, .01),
                     limits = c(-.02, .02), 
                     expand = c(0,0))+
  theme_minimal()+
  theme(legend.position = "none",
        panel.grid = element_blank(),
        plot.title = element_text(hjust = 0.5, size = 20))+
  coord_flip()+
  labs(x=NULL,y=NULL,title = "2020")+
  annotate(geom="text",
           y=-0.01,x=100,
           label="Male")+
  annotate(geom="text",
           y=0.01,x=100,
           label="Female")+
  scale_fill_material_d()
image.png

It can be seen from the above picture that it is still a tower, but it has changed from an Egyptian pyramid to an ancient Buddhist pagoda in my country!

Next, we make only one picture for each year, and then make a gif

mapping code

years<-unique(df$Year)

figures <- list()

for (i in seq(1,length(years),10)){
  df %>% filter(Year == years[i]) %>% 
    mutate(FP = Female/sum(Total),
           MP = - Male/sum(Total)) %>% 
    select(Age,FP,MP) %>% 
    reshape2::melt(id.vars="Age",
                   variable.name="Sex",
                   value.name = "prop") %>% 
    ggplot()+
    geom_col(aes(x=Age,y=prop,fill=Sex),
             width = 1)+
    scale_y_continuous(breaks = c(-.01, 0, .01), 
                       labels = c(.01, 0, .01),
                       limits = c(-.02, .02), 
                       expand = c(0,0))+
    theme_minimal()+
    theme(legend.position = "none",
          panel.grid = element_blank(),
          plot.title = element_text(hjust = 0.5, size = 20))+
    coord_flip()+
    labs(x=NULL,y=NULL,title = years[i])+
    annotate(geom="text",
             y=-0.01,x=100,
             label="Male")+
    annotate(geom="text",
             y=0.01,x=100,
             label="Female")+
    scale_fill_material_d() -> gg
  figures[[i]] <- gg
}


for (i in seq(1,length(years),10)){
  ggsave(paste0('20210826/swe-'
                years[i], '.png'), 
         figures[[i]], 
         width = 8, 
         height = 5.6)
}

Only make gifs and use them in tools like https://gifmaker.me/

Webp.net-gifmaker.gif

That's all for today's content

Welcome everyone to pay attention to my public number

Xiao Ming's data analysis notebook

Message to discuss related content

The sample data and code of today's tweet can be 20210826obtained by leaving a message in the background of the official account

小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!


本文分享自微信公众号 - 小明的数据分析笔记本(gh_0c8895f349d3)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4579431/blog/5211975