A better ggplot learning blog post
author: Li
Pidong email: [email protected]
date: March 7, 2016
http://blog.csdn.net/tanzuozhev/article/details/50822204
This article adds my own understanding on the basis of http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2) .
The data type accepted by ggplot2 must be a data.frame structure,
Discrete data as x-axis
For bar graphs, there are two different options for setting the height:
- The corresponding values of x, y are the actual values on the graph, x is the label on the horizontal axis, and y is the height of the vertical axis. At this time, it is used
geom_bar(stat="identity")
as a layer.
library(ggplot2)
dat <- data.frame(
time = factor(c("Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(14.89, 17.23)
)
dat
- 1
- 2
- 3
- 4
- 5
- 6
## time total_bill
## 1 Lunch 14.89
## 2 Dinner 17.23
- 1
- 2
- 3
time
As a variable factor, and the x-axis labels represent the color of the filling total_bill
as the actual value of the y-axis represents the height
ggplot(data=dat, aes(x=time, y=total_bill)) +
geom_bar(stat="identity")
- 1
- 2
# 以time作为颜色填充
ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
geom_bar(stat="identity")
- 1
- 2
- 3
## 等同于
ggplot(data=dat, aes(x=time, y=total_bill)) +
geom_bar(aes(fill=time), stat="identity")
- 1
- 2
- 3
# 添加黑色轮廓线
ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
geom_bar(colour="black", stat="identity")
- 1
- 2
- 3
# 去除图例
ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
geom_bar(colour="black", stat="identity") +
guides(fill=FALSE)
- 1
- 2
- 3
- 4
# 添加其他信息 title, narrower bars, fill color, and change axis labels
ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity") +
guides(fill=FALSE) +
xlab("Time of day") + ylab("Total bill") +
ggtitle("Average bill for 2 people")
- 1
- 2
- 3
- 4
- 5
- 6
- Enter a set of data, and count the information on the x-axis and y-axis. The x-axis is the retained value of the data to remove duplicates, and the y-axis is the number of repetitions corresponding to the x-axis. Use
geom_bar(stat="bin")
as a new layer.
# 使用reshape2包的tips数据集
library(reshape2)
# 数据展示
head(tips)
- 1
- 2
- 3
- 4
## total_bill tip sex smoker day time size
## 1 16.99 1.01 Female No Sun Dinner 2
## 2 10.34 1.66 Male No Sun Dinner 3
## 3 21.01 3.50 Male No Sun Dinner 3
## 4 23.68 3.31 Male No Sun Dinner 2
## 5 24.59 3.61 Female No Sun Dinner 4
## 6 25.29 4.71 Male No Sun Dinner 4
- 1
- 2
- 3
- 4
- 5
- 6
- 7
Here, only the input variable x, not y, x-axis is the Day, to be used stat="bin"
instead of stat="identity"
, the left data deduplication Sun Sat Thur Fri, the number of repetitions corresponding to the y-axis thereof.
# Bar graph of counts
ggplot(data=tips, aes(x=day,fill=day)) +
geom_bar(stat="bin")
- 1
- 2
- 3
## 等同于
ggplot(data=tips, aes(x=day)) +
geom_bar()# stat参数默认为 bin
- 1
- 2
- 3
line chart
time: x-axis
total_bill: y-axis
# Basic line graph
ggplot(data=dat, aes(x=time, y=total_bill, group=1)) +
geom_line()
- 1
- 2
- 3
## This would have the same result as above
# ggplot(data=dat, aes(x=time, y=total_bill)) +
# geom_line(aes(group=1))
# 折线图添加点
ggplot(data=dat, aes(x=time, y=total_bill, group=1)) +
geom_line() +
geom_point()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
# 修改颜色
# Change line type and point type, and use thicker line and larger points
# Change points to circles with white fill
ggplot(data=dat, aes(x=time, y=total_bill, group=1)) +
geom_line(colour="red", linetype="dashed", size=1.5) +
geom_point(colour="red", size=4, shape=21, fill="white")
- 1
- 2
- 3
- 4
- 5
- 6
# Change the y-range to go from 0 to the maximum value in the total_bill column,
# and change axis labels
# 修改y轴的范围,从0到最大值
ggplot(data=dat, aes(x=time, y=total_bill, group=1)) +
geom_line() +
geom_point() +
expand_limits(y=0) +# 修改y轴的范围,从0到最大值 expand_limits(y = c(1, 9)),y从1到9
xlab("Time of day") + ylab("Total bill") +
ggtitle("Average bill for 2 people")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
More data variables
New data, a variable sex is added here
dat1 <- data.frame(
sex = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(13.53, 16.81, 16.24, 17.42)
)
dat1
- 1
- 2
- 3
- 4
- 5
- 6
## sex time total_bill
## 1 Female Lunch 13.53
## 2 Female Dinner 16.81
## 3 Male Lunch 16.24
## 4 Male Dinner 17.42
- 1
- 2
- 3
- 4
- 5
Bar graph
Variable mapping
time: x-axis
sex: color fill
total_bill: y-axis.
# 这里涉及了几个图形的位置摆放
# 默认为堆叠(Stacked bar graph)
ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity")
- 1
- 2
- 3
- 4
# 位置摆放, position_dodge()为分开摆放
ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
- 1
- 2
- 3
- 4
# Change colors
ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge(), colour="black") +
scale_fill_manual(values=c("#999999", "#E69F00"))# 修改填充的颜色,填充的颜色数组大小必须与fill(sex)的大小一致
- 1
- 2
- 3
- 4
Modify the mapping of variables, the x-axis is sex, and the color is filled with time
# Bar graph, time on x-axis, color fill grouped by sex -- use position_dodge()
ggplot(data=dat1, aes(x=sex, y=total_bill, fill=time)) +
geom_bar(stat="identity", position=position_dodge(), colour="black")
- 1
- 2
- 3
line chart
Variable mapping
time: x-axis
sex: line color
total_bill: y-axis.
In order to draw multiple lines, the data must be grouped, here we sex
group, there will be two lines, Female
one and Male
one.
# 简单图
ggplot(data=dat1, aes(x=time, y=total_bill, group=sex)) +
geom_line() +
geom_point()
- 1
- 2
- 3
- 4
# 加入颜色
ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, colour=sex)) +
geom_line() +
geom_point()
- 1
- 2
- 3
- 4
# Map sex to different point shape, and use larger points
ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, shape=sex)) +
geom_line() +
geom_point()
- 1
- 2
- 3
- 4
# Use thicker lines and larger points, and hollow white-filled points
ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, shape=sex)) +
geom_line(size=1.5) +
geom_point(size=3, fill="white") +
scale_shape_manual(values=c(22,21))# 修改shape的类型
- 1
- 2
- 3
- 4
- 5
Modify the mapping relationship of variables, group by time, lunch group, dinner group
ggplot(data=dat1, aes(x=sex, y=total_bill, group=time, shape=time, color=time)) +
geom_line() +
geom_point()
- 1
- 2
- 3
example
Bar graph
ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(colour="black", stat="identity",
position=position_dodge(),
size=.3) + # Thinner lines
scale_fill_hue(name="Sex of payer") + # Set legend title
xlab("Time of day") + ylab("Total bill") + # Set axis labels
ggtitle("Average bill for 2 people") + # Set title
theme_bw()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
line chart
ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, shape=sex, colour=sex)) +
geom_line(aes(linetype=sex), size=1) + # Set linetype by sex
geom_point(size=3, fill="white") + # Use larger points, fill with white
expand_limits(y=0) + # 设置x y轴的起止范围,这里是y从0开始
scale_colour_hue(name="Sex of payer", # Set legend title
l=30) + # Use darker colors (lightness=30)
scale_shape_manual(name="Sex of payer",
values=c(22,21)) + # Use points with a fill color
scale_linetype_discrete(name="Sex of payer") +
xlab("Time of day") + ylab("Total bill") + # Set axis labels
ggtitle("Average bill for 2 people") + # Set title
theme_bw() + # 设置主题
theme(legend.position=c(.7, .4)) # 设置图例的位置
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
This line chart uses three attributes of color scale_colour_hue
, shape scale_shape_manual
, and line type. scale_linetype_discrete
There should be 3 legends, but because the names of the legends are the same, they are classified into one category. If the names of the three legends are different, 3 legends will appear. .
ggplot(data=dat1, aes(x=time, y=total_bill, group=sex, shape=sex, colour=sex)) +
geom_line(aes(linetype=sex), size=1) + # Set linetype by sex
geom_point(size=3, fill="white") + # Use larger points, fill with white
expand_limits(y=0) + # 设置x y轴的起止范围,这里是y从0开始
scale_colour_hue(name="Sex of payer1", # Set legend title
l=30) + # Use darker colors (lightness=30)
scale_shape_manual(name="Sex of payer2",
values=c(22,21)) + # Use points with a fill color
scale_linetype_discrete(name="Sex of payer3") +
xlab("Time of day") + ylab("Total bill") + # Set axis labels
ggtitle("Average bill for 2 people") + # Set title
theme_bw() + # 设置主题
theme(legend.position=c(.7, .4)) # 设置图例的位置
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
Continuous data as x-axis
New data
datn <- read.table(header=TRUE, text='
supp dose length
OJ 0.5 13.23
OJ 1.0 22.70
OJ 2.0 26.06
VC 0.5 7.98
VC 1.0 16.77
VC 2.0 26.14
')
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
Dose is the x-axis, where dose is numeric, which is regarded as a continuous variable
ggplot(data=datn, aes(x=dose, y=length, group=supp, colour=supp)) +
geom_line() +
geom_point()
- 1
- 2
- 3
When taking dose as a continuous variable, even though dose has only three types of 0.5, 1.0, 2.0, the x-axis must also show 0.5, 1.0, 1.5, 2.0 or even more points.
Discrete data as x-axis
Here we convert the dose data into the factor type, which becomes a discrete type, 0.5, 1.0, and 2.0 are just pure category names.
# Copy the data frame and convert dose to a factor
datn2 <- datn
datn2$dose <- factor(datn2$dose)
ggplot(data=datn2, aes(x=dose, y=length, group=supp, colour=supp)) +
geom_line() +
geom_point()
- 1
- 2
- 3
- 4
- 5
- 6
# 直接在ggplot中转换格式也是可以的
ggplot(data=datn, aes(x=factor(dose), y=length, group=supp, colour=supp)) +
geom_line() +
geom_point()
- 1
- 2
- 3
- 4
Continuous data and discrete data are used for bar graphs, and the same graph is obtained.
# Use datn2 from above
ggplot(data=datn2, aes(x=dose, y=length, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
- 1
- 2
- 3
# 直接使用factor转化
ggplot(data=datn, aes(x=factor(dose), y=length, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
- 1
- 2
- 3
author: Li
Pidong email: [email protected]
date: March 7, 2016