R language ggalluvia package elegantly draws Sankey diagram

Sankey diagrams are often used to represent the flow of data such as energy, materials, capital, etc. from one place to another. The left side represents the reserved point, the right side represents the inflow point, and the middle bandwidth represents the size of the inflow. Therefore, we can clearly see the data flow of the two nodes through the Sankey diagram. Today I will introduce the ggalluvia package to draw Sankey diagrams. The ggalluvia package is characterized by relatively simple operation and easy to use.

insert image description here
Import data and R package first

library(foreign)
library(ggplot2)
library(ggalluvial)
library(networkD3)
bc <- read.spss("E:/r/test/tree_car.sav",
                use.value.labels=F, to.data.frame=T)

insert image description here
Let's look at the data, car is the price of the car, age is the age, gender is the gender, inccat is the income, here is divided into 4 levels, ed is the education level. (Reply from the official account: car sales, the data can be obtained).
Let's process the data, convert the categorical variables into factors, and add a label.

bc$ed<-factor(bc$ed,levels=c(1:5),labels=c("小学","初中","高中","大学","博士"))
bc$inccat<-factor(bc$inccat,levels=c(1:4),labels=c("低收入","中低收入","中等收入","富裕"))
bc$gender<-ifelse(bc$gender=="m",1,0)
bc$gender<-factor(bc$gender,levels = c(0,1),labels=c("女性","男性"))
bc$marital<-factor(bc$marital,levels = c(0,1),labels=c("未婚","已婚"))

insert image description here
This data is relatively large, we take part of it to draw

bc<-bc[1:100,]

It is relatively simple for the ggalluvial package to draw a Sankey diagram. We first need to set the axis1 outflow node and axis2 inflow node.
Suppose we want to know the flow of car purchase expenses of different income groups in different genders (just a demonstration, not of practical significance),

ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed)) +
  geom_alluvium(aes(fill = gender))

insert image description here
Such a simple Sankey diagram is drawn, we further set the width for it, add a box and font, fill here indicates the color of the box

ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed)) +
  geom_alluvium(aes(fill = gender)) +
  geom_stratum(width = 1/6, fill = "black", color = "grey") +
  geom_label(stat = "stratum", aes(label = after_stat(stratum)))

insert image description here
You can also further set the X axis

ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed)) +
  geom_alluvium(aes(fill = gender)) +
  geom_stratum(width = 1/6, fill = "black", color = "grey") +
  geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("收入层次", "教育程度"), expand = c(.05, .05)) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("收入和购买汽车关系")

insert image description here
Set in scale_fill_brewer can further set the style

ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed)) +
  geom_alluvium(aes(fill = gender)) +
  geom_stratum(width = 1/6, fill = "black", color = "grey") +
  geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("收入层次", "教育程度"), expand = c(.05, .05)) +
  scale_fill_brewer(type = "qual", palette = "Set3") +
  ggtitle("收入和购买汽车关系")

insert image description here
We can further add intermediate nodes

ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed,axis3 = marital)) +
  geom_alluvium(aes(fill = gender)) +
  geom_stratum(width = 1/6, fill = "black", color = "grey") +
  geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Gender", "Dept"), expand = c(.05, .05)) +
  scale_fill_brewer(type = "qual", palette = "Set3") +
  ggtitle("收入和购买汽车关系")

insert image description here
We can flip it upside down using the coord_flip() function

ggplot(bc,aes(y = car,
              axis1 = inccat, axis2 = ed, axis3 = marital)) +
  geom_alluvium(aes(fill = gender),
                width = 1/8, knot.pos = 0, reverse = FALSE) +
  scale_fill_manual(values = c("男性"= "blue", "女性" = "red")) +
  guides(fill = "none") +
  geom_stratum(alpha = .4, width = 1/8, reverse = FALSE) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)),
            reverse = FALSE) +
  scale_x_continuous(breaks = 1:3, labels = c("inccat", "ed", "marital")) +
  coord_flip() +
  ggtitle("收入和购买汽车关系")

insert image description here
You can also change styles after inversion

ggplot(bc,aes(y = car,
              axis1 = inccat, axis2 = ed, axis3 = marital)) +
  geom_alluvium(aes(fill = gender),
                width = 1/8, knot.pos = 0, reverse = FALSE) +
  scale_fill_manual(values = c("男性"= "blue", "女性" = "red")) +
  guides(fill = "none") +
  geom_stratum(alpha = .4, width = 1/8, reverse = FALSE) +
  geom_text(stat = "stratum", aes(label = after_stat(stratum)),
            reverse = FALSE) +
  scale_fill_brewer(type = "qual", palette = "Set3")+
  scale_x_continuous(breaks = 1:3, labels = c("inccat", "ed", "marital")) +
  coord_flip() +
  ggtitle("收入和购买汽车关系")

insert image description here
The next issue will introduce the use of the networkD3 package to draw a Sankey diagram.

Guess you like

Origin blog.csdn.net/dege857/article/details/131844051