Sankey diagrams are often used to represent the flow of data such as energy, materials, capital, etc. from one place to another. The left side represents the reserved point, the right side represents the inflow point, and the middle bandwidth represents the size of the inflow. Therefore, we can clearly see the data flow of the two nodes through the Sankey diagram. Today I will introduce the ggalluvia package to draw Sankey diagrams. The ggalluvia package is characterized by relatively simple operation and easy to use.
Import data and R package first
library(foreign)
library(ggplot2)
library(ggalluvial)
library(networkD3)
bc <- read.spss("E:/r/test/tree_car.sav",
use.value.labels=F, to.data.frame=T)
Let's look at the data, car is the price of the car, age is the age, gender is the gender, inccat is the income, here is divided into 4 levels, ed is the education level. (Reply from the official account: car sales, the data can be obtained).
Let's process the data, convert the categorical variables into factors, and add a label.
bc$ed<-factor(bc$ed,levels=c(1:5),labels=c("小学","初中","高中","大学","博士"))
bc$inccat<-factor(bc$inccat,levels=c(1:4),labels=c("低收入","中低收入","中等收入","富裕"))
bc$gender<-ifelse(bc$gender=="m",1,0)
bc$gender<-factor(bc$gender,levels = c(0,1),labels=c("女性","男性"))
bc$marital<-factor(bc$marital,levels = c(0,1),labels=c("未婚","已婚"))
This data is relatively large, we take part of it to draw
bc<-bc[1:100,]
It is relatively simple for the ggalluvial package to draw a Sankey diagram. We first need to set the axis1 outflow node and axis2 inflow node.
Suppose we want to know the flow of car purchase expenses of different income groups in different genders (just a demonstration, not of practical significance),
ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed)) +
geom_alluvium(aes(fill = gender))
Such a simple Sankey diagram is drawn, we further set the width for it, add a box and font, fill here indicates the color of the box
ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed)) +
geom_alluvium(aes(fill = gender)) +
geom_stratum(width = 1/6, fill = "black", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum)))
You can also further set the X axis
ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed)) +
geom_alluvium(aes(fill = gender)) +
geom_stratum(width = 1/6, fill = "black", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("收入层次", "教育程度"), expand = c(.05, .05)) +
scale_fill_brewer(type = "qual", palette = "Set1") +
ggtitle("收入和购买汽车关系")
Set in scale_fill_brewer can further set the style
ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed)) +
geom_alluvium(aes(fill = gender)) +
geom_stratum(width = 1/6, fill = "black", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("收入层次", "教育程度"), expand = c(.05, .05)) +
scale_fill_brewer(type = "qual", palette = "Set3") +
ggtitle("收入和购买汽车关系")
We can further add intermediate nodes
ggplot(bc,aes(y = car, axis1 = inccat, axis2 =ed,axis3 = marital)) +
geom_alluvium(aes(fill = gender)) +
geom_stratum(width = 1/6, fill = "black", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("Gender", "Dept"), expand = c(.05, .05)) +
scale_fill_brewer(type = "qual", palette = "Set3") +
ggtitle("收入和购买汽车关系")
We can flip it upside down using the coord_flip() function
ggplot(bc,aes(y = car,
axis1 = inccat, axis2 = ed, axis3 = marital)) +
geom_alluvium(aes(fill = gender),
width = 1/8, knot.pos = 0, reverse = FALSE) +
scale_fill_manual(values = c("男性"= "blue", "女性" = "red")) +
guides(fill = "none") +
geom_stratum(alpha = .4, width = 1/8, reverse = FALSE) +
geom_text(stat = "stratum", aes(label = after_stat(stratum)),
reverse = FALSE) +
scale_x_continuous(breaks = 1:3, labels = c("inccat", "ed", "marital")) +
coord_flip() +
ggtitle("收入和购买汽车关系")
You can also change styles after inversion
ggplot(bc,aes(y = car,
axis1 = inccat, axis2 = ed, axis3 = marital)) +
geom_alluvium(aes(fill = gender),
width = 1/8, knot.pos = 0, reverse = FALSE) +
scale_fill_manual(values = c("男性"= "blue", "女性" = "red")) +
guides(fill = "none") +
geom_stratum(alpha = .4, width = 1/8, reverse = FALSE) +
geom_text(stat = "stratum", aes(label = after_stat(stratum)),
reverse = FALSE) +
scale_fill_brewer(type = "qual", palette = "Set3")+
scale_x_continuous(breaks = 1:3, labels = c("inccat", "ed", "marital")) +
coord_flip() +
ggtitle("收入和购买汽车关系")
The next issue will introduce the use of the networkD3 package to draw a Sankey diagram.