cut the continuous variable function category is divided into
To become continuous variables discrete factors, the need for cutting continuous variables, each interval may be a factor. Cutting work can be done with a continuous variable cut functions.
Function cut () able to cut into different blocks numerical variable, and then returns a factor of numeric data packet: cut function using numerical data packet
cut(x,breaks,labels=NULL,include.lowest=FALSE,right=TRUE,dig.lab=3,ordered_result=FALSE,...)
parameter | Note |
---|---|
x | Numeric variables |
breaks | Cutting point vector has two values, a single integer (represented divided into several intervals) and a vector (vector divided according to the number of inside) |
labels | Each packet label, i.e. the label after discretization factor, # labels are expressed as the interval factor is TRUE, i.e., the label |
right | Logic value, the default is TRUE (left and right open and closed); FALSE (left and right open and closed) |
ordered | TRUE, generates an ordered factor |
include.lowest | 逻辑值,indicating if an ‘x[i]’ equal to the lowest (or highest, for right = FALSE) ‘breaks’ value should be included. |
dig.lab = n | N represents a value dividing section decimals |
Example 1
x<-rep(0:3,c(1,2,3,4))
x
# [1] 0 1 1 2 2 2 3 3 3 3
length(x)
#[1] 10
cut(x,breaks=0:3)
#结果
[1] <NA> (0,1] (0,1] (1,2] (1,2] (1,2] (2,3]
[8] (2,3] (2,3] (2,3]
Levels: (0,1] (1,2] (2,3]
# Explanation:
Breaks = 0:. 3, i.e. 0,1,2,3; right due to a default value (right-left opening and closing), so that the
interval is divided into (0,1], (1,2], (2, . 3]
X value of 0,112,223,333
- 0 does not belong to partition between returned <NA>
- 1 belongs to the interval (0, 1], return (0,1]
- 1 belongs to the interval (0, 1], return (0,1]
- 2 belongs to the interval (1,2], return (1,2]
... - 3 belongs to the interval (2,3], return (2, 3]
#x
#[1] 0 1 1 2 2 2 3 3 3 3
cut(x,c(-Inf,0,1,2,3,Inf))
#结果
[1] (-Inf,0] (0,1] (0,1] (1,2] (1,2]
[6] (1,2] (2,3] (2,3] (2,3] (2,3]
5 Levels: (-Inf,0] (0,1] (1,2] ... (3, Inf]
#5个区间,默认左开右闭
#0属于区间(-Inf,0],返回(-Inf,0]
#1属于区间(0,1],返回(0,1]
...
Example 2
y<-c(1,2,3,4,5,2,3,4,5,6,7)
cut(y,3,dig.lab=4,ordered=TRUE)
#breaks为单个整数,表示区分为几个区间
#dig.lab表示区间分割值为4位小数
#ordered=TRUE,生成有序因子
#结果
[1] (0.994,3] (0.994,3] (0.994,3] (3,5]
[5] (3,5] (0.994,3] (0.994,3] (3,5]
[9] (3,5] (5,7.006] (5,7.006]
Levels: (0.994,3] < (3,5] < (5,7.006]
Example 3 :
now we want to be divided into as age children, young, middle-aged, senile, the interval can be divided into
age <= 12 for the child; 12 <age <= 30 youth; 30 <age <= 60 is Middle-aged
age> 60 for the elderly, the use of cut function, as follows;
age<-c(55,12,30,9,22,24,78,109,45,66,49)
height<-c(156,175,154,165,184,125,148,168,155,157,168)
dd<-data.frame(age,height)
dd
#结果
age height
1 55 156
2 12 175
3 30 154
4 9 165
5 22 184
6 24 125
7 78 148
8 109 168
9 45 155
10 66 157
11 49 168
a<-cut(dd$age,breaks=c(-Inf,12,30,60,Inf),
labels = c("小孩","青年","中年","老年"))
dd<-cbind(dd,a)
dd
#结果
age height a
1 55 156 中年
2 12 175 小孩
3 30 154 青年
4 9 165 小孩
5 22 184 青年
6 24 125 青年
7 78 148 老年
8 109 168 老年
9 45 155 中年
10 66 157 老年
11 49 168 中年