R Language cut function

cut the continuous variable function category is divided into

To become continuous variables discrete factors, the need for cutting continuous variables, each interval may be a factor. Cutting work can be done with a continuous variable cut functions.

Function cut () able to cut into different blocks numerical variable, and then returns a factor of numeric data packet: cut function using numerical data packet

cut(x,breaks,labels=NULL,include.lowest=FALSE,right=TRUE,dig.lab=3,ordered_result=FALSE,...)
parameter Note
x Numeric variables
breaks Cutting point vector has two values, a single integer (represented divided into several intervals) and a vector (vector divided according to the number of inside)
labels Each packet label, i.e. the label after discretization factor, # labels are expressed as the interval factor is TRUE, i.e., the label
right Logic value, the default is TRUE (left and right open and closed); FALSE (left and right open and closed)
ordered TRUE, generates an ordered factor
include.lowest 逻辑值,indicating if an ‘x[i]’ equal to the lowest (or highest, for right = FALSE) ‘breaks’ value should be included.
dig.lab = n N represents a value dividing section decimals

Example 1

x<-rep(0:3,c(1,2,3,4))
x
# [1] 0 1 1 2 2 2 3 3 3 3
length(x)
#[1] 10
cut(x,breaks=0:3)
#结果
[1] <NA>  (0,1] (0,1] (1,2] (1,2] (1,2] (2,3]
[8] (2,3] (2,3] (2,3]
Levels: (0,1] (1,2] (2,3]

# Explanation:
Breaks = 0:. 3, i.e. 0,1,2,3; right due to a default value (right-left opening and closing), so that the
interval is divided into (0,1], (1,2], (2, . 3]
X value of 0,112,223,333

  • 0 does not belong to partition between returned <NA>
  • 1 belongs to the interval (0, 1], return (0,1]
  • 1 belongs to the interval (0, 1], return (0,1]
  • 2 belongs to the interval (1,2], return (1,2]
    ...
  • 3 belongs to the interval (2,3], return (2, 3]
#x
#[1] 0 1 1 2 2 2 3 3 3 3
cut(x,c(-Inf,0,1,2,3,Inf))
#结果
[1] (-Inf,0] (0,1]    (0,1]    (1,2]    (1,2]   
[6] (1,2]    (2,3]    (2,3]    (2,3]    (2,3]   
5 Levels: (-Inf,0] (0,1] (1,2] ... (3, Inf]

#5个区间,默认左开右闭
#0属于区间(-Inf,0],返回(-Inf,0]
#1属于区间(0,1],返回(0,1]
...

Example 2

y<-c(1,2,3,4,5,2,3,4,5,6,7)
cut(y,3,dig.lab=4,ordered=TRUE)
#breaks为单个整数,表示区分为几个区间
#dig.lab表示区间分割值为4位小数
#ordered=TRUE,生成有序因子
#结果
[1] (0.994,3] (0.994,3] (0.994,3] (3,5]    
[5] (3,5] (0.994,3] (0.994,3] (3,5]    
[9] (3,5] (5,7.006] (5,7.006]
Levels: (0.994,3] < (3,5] < (5,7.006]

Example 3 :
now we want to be divided into as age children, young, middle-aged, senile, the interval can be divided into
age <= 12 for the child; 12 <age <= 30 youth; 30 <age <= 60 is Middle-aged
age> 60 for the elderly, the use of cut function, as follows;

age<-c(55,12,30,9,22,24,78,109,45,66,49)
height<-c(156,175,154,165,184,125,148,168,155,157,168)
dd<-data.frame(age,height)
dd
#结果
    age height
1   55    156
2   12    175
3   30    154
4    9    165
5   22    184
6   24    125
7   78    148
8  109    168
9   45    155
10  66    157
11  49    168

a<-cut(dd$age,breaks=c(-Inf,12,30,60,Inf),
		labels = c("小孩","青年","中年","老年"))
dd<-cbind(dd,a)
dd
#结果
    age  height  a
1   55    156   中年
2   12    175   小孩
3   30    154   青年
4    9    165   小孩
5   22    184   青年
6   24    125   青年
7   78    148   老年
8  109    168   老年
9   45    155   中年
10  66    157   老年
11  49    168   中年
Released eight original articles · won praise 3 · Views 2265

Guess you like

Origin blog.csdn.net/qq_42374697/article/details/104088541