R language ggplot2 draws a beautiful dumbbell graph

I found this picture by chance when I was looking for information. This picture comes from the paper 

Russian periphery is dying in movement: a cohort assessment of internal youth migration in Central Russia

The link to the paper is 

https://link.springer.com/article/10.1007%2Fs10708-018-9953-5,

Links to data and code storage

https://gist.github.com/ikashnitsky/2f3e2b2af6f50911bb775bbce6eb0fb8

https://ikashnitsky.github.io/2019/dotplot/


I feel that this picture is very beautiful, and the data code is still public, so let's repeat it

The abscissa of this graph is the rate of change, the ordinate is the region, and each ordinate corresponds to two 1980-84 1988-92 variables, each of which corresponds to a solid point and a hollow point, census and stat record

I still can't think of how to apply this graph to my own data. It can be used to represent a certain value, such as processing and comparison.

The previous code for sorting data will not be introduced here. If you are interested, you can run it yourself and study the function of each line of code.
library(tidyverse)
df<-read.csv("20210822_raw.csv")
head(df)
# relevel regions ascending 
df_plot <- df %>% 
  select(cohort, region, change_cens) %>% 
  spread(cohort, change_cens) %>% 
  arrange(`Cohort 1988-1992`) %>% 
  mutate(
    region = region %>% 
      as_factor %>% 
      fct_relevel("CFD TOTAL", after = 0)
  ) %>% 
  arrange(region) %>%
  gather("cohort""value", 2:3) %>% 
  left_join(df, by = c("region""cohort")) 

df_plot %>% 
  # calculate y positioning values
  mutate(region = region %>% as_factor,
         y = region %>% as.numeric,
         adjust = ifelse(cohort=="Cohort 1988-1992", .15, -.15),
         ypos = y - adjust) %>% 
  write.csv(file="20210822.csv",quote = F,row.names = F)

The final drawing data used

df_plot_1<-read.csv("20210822.csv")
head(df_plot_1)
image.png

drawing code

library(ggplot2)
library(tidyverse)
library(extrafont)

df_plot_1 %>% pull(region) %>% unique() -> labels
breaks<-1:length(labels)
breaks
pal <- c("#8C510A""#003C30")

df_plot_1 %>% 
  ggplot(aes(color = cohort, y = ypos))+
  geom_vline(xintercept = 0, size = 2, 
             alpha = .5, color = "grey50")+
  geom_segment(aes(x = change_cens, 
                   xend = change_rolling, 
                   yend = ypos))+
  geom_point(aes(x = change_cens), 
             shape = 16, size = 2)+
  geom_point(aes(x = change_rolling), 
             shape = 21, size = 2, 
             fill = "white")+
  scale_color_manual(values = pal)+
  scale_y_continuous(breaks = breaks, 
                     labels = labels, 
                     expand = c(.01, .01))+
  theme_minimal(base_family = "Times New Roman"
                base_size = 12)+
  theme(legend.position = "none"
        panel.grid.minor.y = element_blank(),
        panel.grid.major.y = element_line(size = 4, color = "grey95"),
        axis.text.y = element_text(vjust = .3, size = 12))+
  labs(x = "Change in cohort size, 2003-2010, %", y = NULL)
image.png

Compared with the original code, I have modified the font here, because hrbrthemesthis theme package is used in the original code, and I have never understood the font involved.

Next is the legend

The method he uses here is to use the annotate()function to manually add

font_rc <- "Times New Roman"

p1+
  annotate("rect", xmin = 29, xmax = 63,
           ymin = 2.5, ymax = 9.5,
           color = "grey50", fill = "white")+
  annotate("text", x = 45, y = 8.5, 
           label = "LEGEND"
           size = 5, hjust = .5, 
           family = font_rc, color = "grey20")+
  annotate("text", x = 45, y = 7, 
           label = "Change in cohort size by"
           size = 4.5, hjust = .5, 
           family = font_rc, color = "grey20")+
  annotate("point", x = c(32.5, 47.5), y = 6, 
           pch = c(16, 21), size = 2, color = 1)+
  annotate("text", x = c(35, 50), y = 6, 
           label = c("census""stat record"), 
           size = 4.5, hjust = 0, 
           family = font_rc, color = "grey20")+
  annotate("text", x = 45, y = 4.5, 
           label = "Cohorts born in"
           size = 4.5, hjust = .5, 
           family = font_rc, color = "grey20")+
  annotate("segment", x = c(32, 47), xend = c(34, 49), 
           y = 3.5, yend = 3.5, 
           pch = c(16, 21), size = 2, color = pal)+
  annotate("text", x = c(35, 50), y = 3.5, 
           label = c("1980-84""1988-92"), 
           size = 4.5, hjust = 0, 
           family = font_rc, color = "grey20")

Final result



image.png

The download link of sample data and code can be obtained in the comment area of ​​today's next tweet, the next tweet is an advertisement

Welcome everyone to pay attention to my public number

Xiao Ming's data analysis notebook

Xiaoming’s data analysis notebook public account mainly shares: 1. Simple examples of R language and python for data analysis and data visualization; 2. Reading notes on horticultural plants related transcriptomics, genomics, and population genetics literature; 3. Bioinformatics Learn introductory study materials and your own study notes!


本文分享自微信公众号 - 小明的数据分析笔记本(gh_0c8895f349d3)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4579431/blog/5198260