R Language Learning Special note: The letters are case sensitive
- R language objects: R 6 kinds of languages stored data object types, which are a vector, array, list, matrix factor, the data block, the next will be 11 illustrate
- Five basic types of objects: character (Character) value (which may be an integer or a decimal) (numeric), integer (integer), compound (complex), logic (Logical)
- R object attributes: name (name) of the object, the object's dimensions (matrix array) (Dim), the type (class) of the object, the object length (length)
- R Language assignment: R Language assignment unlike C language and Java language assignment, R linguistic variables need not be declared, using a direct assignment
With assignment symbol '<-' or '=' indicates recommended '<-' such as:
x <-1
x: [. 1] #. 1 square brackets is the first element in x
- Vector (vector)
- Brief Description: vector memory is a one-dimensional array of numeric, character, or logical data type, the vector can contain an object type unified
- Function Description: The function vector (mode = 'logical', length = 0L) has two parameters: the type (mode) and a length (length), creating vector elements specified by the parameter values depend on data types: numeric (numeric ()) is a vector element value is 0, the logic type (logical ()) are is FALSE, character (Character ()) are "."
- Vector creation:
x<-c("a","b","c")
> x
[1] "a" "b" "c"
> y<-c(1,2,3,4,5)
> y
[1] 1 2 3 4 5
> z<-c(1:8)
> z
[1] 1 2 3 4 5 6 7 8
- Matrix (matrix)
- Matrix is a brief description of the property vector Garvey
- Matrix created: a minimum of two matrix parameters, rows and columns, create a blank content, to populate the matrix is in columns, dim function can be how many columns matrix is good to see how many lines
X <- Matrix (nrow. 3 =, = ncol. 4)
Attributes specifically described (x) # View information matrix
Will be converted into one type of unified type elements must be the same matrix, the matrix is a one-dimensional array, when stored in a matrix of different types of content, as follows: Note:
> x<-c(a,2,3,TRUE,FALSE)
> x
[1] 5 2 3 1 0
- Array (Array)
- Brief Description: R is a multi-dimensional array, the difference between the matrix and can be a multi-dimensional array is an array, but only a one-dimensional matrix
- Create an array:
> x<-array(1:24,dim=c(4,6))
> x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 5 9 13 17 21
[2,] 2 6 10 14 18 22
[3,] 3 7 11 15 19 23
[4,] 4 8 12 16 20 24
> k<-array(1:24,dim=c(2,3,4)) #创建一个2行3列的思维数组,dim=c(2,3,4) 参数4代表的是数组的维度
>
> k
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24
- 列表(list)
- 简要描述:列表是一个包含不同的类型的内容数据集,一个列表中可以存放很多不同类型的数据
- 创建列表:list
l<-list(a,2,TRUE,-2i) > l [[1]] [1] 5 [[2]] [1] 2 [[3]] [1] TRUE [[4]] [1] 0-2i
- 命名列表内容
l2<-list(a=1,b=2,c=3)
> l2
$a
[1] 1
$b
[1] 2
$c
[1] 3
- 因子(Factor)
- 简要描述:factor 因子是用来处理分类数据有序和无序的数据,可以把因子理解成整数向量,因子优于整数向量,在因子中使用levels=...来设置基线水平
- 创建因子 :
> x<-factor(c("female","male","male","female","male")) > x [1] female male male female male Levels: female male > y<-factor(c("female","male","male","female","male"),levels=c("male","female")) > y [1] female male male female male Levels: male female #Levels表示因子的水平,因子的水平可以手动设置 unclass(x) #去掉因子的属性来看因子内容 table(x) #计算因子中的词频
- 数据框(data.frame())
- 简要描述:R数据框是用于存储表格数据,存储列表,和矩阵关系密切,可以把它当作长度相同的列表
- 数据库小规则:数据框中每个元素代表一列数据、数据框中每个元素的长度代表行数、数据框中元素类型可以不同
- 创建数据框:
> df<-data.frame() > df data frame with 0 columns and 0 rows > class(df) [1] "data.frame"
- 日期(Date)
x<-Sys.date:获取当前系统的日期
x2<-"2015-10-01" #声明字符型日期
x3<-"2017-10-01"
x2<-as.Date("2015-10-01")
x3<-as.Date("2017-10-01") #将字符型的日期转换为日期类型
weekdays(x) #获取日期是周几
months(x) #当前日期是哪月
quarters(x) #当前日期属于这一年的哪个季度
julian(x) #表示距离1970-01-01日期到现在过去了多少天18123
#时间运算,日期运算可以转换为整数
> x2
[1] "2015-10-01"
> x3
[1] "2017-10-01"
> x3-x2
Time difference of 731 days
>as.numeric(x3-x2) #将结果转换成数值类型
[1] 731
#两个日期做减法会得到一个字符型的结果,在处理数据时通常需要把它作相应的转换
- 时间(Time)
- 简要描述:时间类型有POSIXct和POSIXlt
- 字符—>时间转换:
as.Date() #字符转日期
as.POSIXct() as.POSIXlt() strptime() #字符转时间
# 例如:
x1<-"Jan 1,2019 02:18"
strptime(x1,"%B %d,%Y %H:%M")
- 缺失值(NA/NaN)
- 简要描述:缺失值的处理在数据分析中是很常见的一种,在数据预处理之前必须要处理数据的缺失值
- R中的缺失值:NA/NaN :其中NaN属于NA,而NA不属于NaN :NaN一般只用于表示数字的缺失值
- NA有类型属性:整数NA和字符NA
- 检测缺失值的方法:is.na() / is.nan()
- 实例:
> x<-c(5,NA,6,NA,NA)
> y<-c(5,NaN,6,NaN,NaN)
>
> is.na(x)
[1] FALSE TRUE FALSE TRUE TRUE
> is.na(y)
[1] FALSE TRUE FALSE TRUE TRUE
> is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE
> is.nan(y)
[1] FALSE TRUE FALSE TRUE TRUE
#说明:is.na()表示的两个结果相同说明NaN属于NA