R语言初学者——数据结构之数据框

数据框是一种表格式的数据结构。数据框旨在模拟数据集，与其他统计软件如SAS或SPSS中的数据集概念一致。

数据框通常是由数据构成的举行数组，行表示观测，列表示变量。数据框每一行本质上是一个列表。每一列必须是同一种类型的数据，本质上是向量。

数据框包括向量矩阵和列表。

数据框可以用data.frame（）函数创建

mydata<-data.frame(col1,col2,col3,……)

数据框的创建

> patient <- c(1,2,3,4)
> age <- c(25,34,28,52)
> diabetes <- c('type1','type2','type3','type4')
> status <- c('poor','improvement','excellent','poor')
> patientdata <- data.frame(patient,age,diabetes,status)
> patientdata
  patient age diabetes      status
1       1  25    type1        poor
2       2  34    type2 improvement
3       3  28    type3   excellent
4       4  52    type4        poor
>

数据框的访问

常规方式（索引）

> patientdata[1]#输出第一列
  patient
1       1
2       2
3       3
4       4
> patientdata[c(2,4)]#输出第二列和第四列
  age      status
1  25        poor
2  34 improvement
3  28   excellent
4  52        poor
> patientdata[-c(1,3)]#表示删除
  age      status
1  25        poor
2  34 improvement
3  28   excellent
4  52        poor
> patientdata['age']#以列名索引
  age
1  25
2  34
3  28
4  52


> patientdata['1',]#取第一行，1为观测名
  patient age diabetes status
1       1  25    type1   poor
> patientdata[1,]#取第一行，1为索引号
  patient age diabetes status
1       1  25    type1   poor

以上索引还是会输出行名和列名，因为输出的还是一个数据框。

下列做法就是直接输出

> patientdata[,'age']
[1] 25 34 28 52
> patientdata[[1]]
[1] 1 2 3 4

还可以利用$符号

> patientdata$diabetes
[1] type1 type2 type3 type4
Levels: type1 type2 type3 type4
> patientdata$age
[1] 25 34 28 52
> patientdata$status
[1] poor        improvement excellent   poor       
Levels: excellent improvement poor

这个做法在以后的数据分析中很重要，比如，我们可以得到diabetes和status之间的列联表。

> table(patientdata$diabetes,patientdata$status)
       
        excellent improvement poor
  type1         0           0    1
  type2         0           1    0
  type3         1           0    0
  type4         0           0    1

但是每次都要键入$符号会很麻烦，因此还有以下访问方法。

attach(),detach()和with()

attach（）是将数据框加在R的搜索目录中。

#首先我们先将全局变量删除，以便看出attach（）和detach（）函数的效果
> rm(patient)
> rm(age)
> rm(status)
> patient
Error: object 'patient' not found
> attach(patientdata)
The following objects are masked _by_ .GlobalEnv:

    age, diabetes, patient, status

> colnames(patientdata)
[1] "patient"  "age"      "diabetes" "status"  
> 
> age
[1] 25 34 28 52
> diabetes
[1] "type1" "type2" "type3" "type4"
> #此时只需要键入列名就可以了

当不需要时，用detach函数解除捆绑

> detach(patientdata)
> age
Error: object 'age' not found

使用with（data，{对各列数据进行操作}）函数

> with(patientdata,{age})
[1] 25 34 28 52

但是，with（）函数的局限性在于，赋值仅在括号内有效。

> with(patientdata,{m<-sum(age)
+ m})
[1] 139
> m
Error: object 'm' not found

若要使之在全局生效，则要

> with(patientdata,{m<<-sum(age)
+ m})
[1] 139
> m
[1] 139

还有如下操作

> patientdata[5]<-c(3,4,5,6)
> patientdata
  patient age diabetes      status V5
1       1  25    type1        poor  3
2       2  34    type2 improvement  4
3       3  28    type3   excellent  5
4       4  52    type4        poor  6

行名常常被默认为是第一列数据，因此

> patientdata <- data.frame(patient,age,diabetes,status,row.names=diabetes)
> patientdata
      patient age diabetes      status
type1       1  25    type1        poor
type2       2  34    type2 improvement
type3       3  28    type3   excellent
type4       4  52    type4        poor

R语言初学者——数据结构之数据框

猜你喜欢