R language medical data analysis practice (1) data structure and data set acquisition

提示:本文章是记录本人看《R语言医学数据分析实战》记录的对自己有用的知识


1. Introduction to R language

1. Common usage

(1) Download package:

	install.package("xxxx")

(2) Get help

	help.xxxx()
	?xxxx

(3) Set the working directory:

	setwd("xxxx/xxx/xxx")
	
	#保存工作空间映像:
	save.image("xxxx")

2. Create a dataset

1. R data structure

R's data structures: vectors, factors, matrices, arrays, and lists.

1) Vector: use c() to create.

> x2<-1:5
> x2
[1] 1 2 3 4 5

> x1<-seq(from=2,to=10,by=2)
> x1
[1]  2  4  6  8 10

> x3<-rep('a',times=4)
> x3
[1] "a" "a" "a" "a"

> x4<-seq(from=3,to=100,by=7)
> x4
 [1]  3 10 17 24 31 38 45 52 59 66 73 80 87 94
> x4[-(1:3)]
 [1] 24 31 38 45 52 59 66 73 80 87 94

Commonly used methods:
length(x): find the number of elements; quantile(x): find the quantile of x; scale(x): standardize x.

2) Factor: use factor() to create.

> sex<-c(1,2,2,1,2)
> sex.f<-factor(sex,levels=c(1,2),labels = c("Male","Female"))
> sex.f
[1] Male   Female Female Male   Female
Levels: Male Female
> levels(sex.f)
[1] "Male"   "Female"

3) Matrix: use matrix() to create.

> M<-matrix(1:6,nrow=2)
> M
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> M1<-matrix(5:10,nrow = 3)
> M1
     [,1] [,2]
[1,]    5    8
[2,]    6    9
[3,]    7   10
> dim(M) #求矩阵维数
[1] 2 3
> dim(M1)
[1] 3 2
> M %*% M1 #矩阵相乘
     [,1] [,2]
[1,]   58   85
[2,]   76  112
> t(M1) #行列式转置
     [,1] [,2] [,3]
[1,]    5    6    7
[2,]    8    9   10
> M2<-matrix(1:4,nrow = 2)
> det(M2) #行列式的值
[1] -2
> solve(M2) #逆矩阵
     [,1] [,2]
[1,]   -2  1.5
[2,]    1 -0.5
> rowSums(M1)
[1] 13 15 17
> rowMeans(M1)
[1] 6.5 7.5 8.5
> M1[1:2,1:2] #取矩阵的前两行和前两列
     [,1] [,2]
[1,]    5    8
[2,]    6    9
> M1[,1:1]
[1] 5 6 7

4) Array: Use array() to create an array and dim() to add dimensions to a vector and define an array.

> A<-1:24
> dim(A)<-c(3,4,2)
> A
, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

, , 2

     [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23
[3,]   15   18   21   24

> dim1<-c("A1","A2","A3")
> dim2<-c("B1","B2","B3","B4")
> dim3<-c("C1","C2")
> array(1:24,dim=c(3,4,2),dimnames=list(dim1,dim2,dim3))
, , C1

   B1 B2 B3 B4
A1  1  4  7 10
A2  2  5  8 11
A3  3  6  9 12

, , C2

   B1 B2 B3 B4
A1 13 16 19 22
A2 14 17 20 23
A3 15 18 21 24

5) List: Use list() to create a list.

> list1 <- list(a = 1, b = 1:5, c = c("red", "blue", "green"))
> list1
$a
[1] 1
$b
[1] 1 2 3 4 5
$c
[1] "red"   "blue"  "green"

> set.seed(123) #设置随机数种子,以实现重复
> dat <- rnorm(10)  #从标准正态分布中生成由10个数组成的随机样本
> bp <- boxplot(dat)
> class(bp)
[1] "list"
> bp
$stats
            [,1]
[1,] -1.26506123
[2,] -0.56047565
[3,] -0.07983455
[4,]  0.46091621
[5,]  1.71506499

$n
[1] 10
$conf
           [,1]
[1,] -0.5901626
[2,]  0.4304935
$out
numeric(0)
$group
numeric(0)
$names
[1] "1"

> bp$stats
            [,1]
[1,] -1.26506123
[2,] -0.56047565
[3,] -0.07983455
[4,]  0.46091621
[5,]  1.71506499

6) Data frame: Use data.frame() to create a list.

> ID<-1:5
> age<-c(25,34,38,28,52)
> sex<-c("male", "female", "male", "female", "male")
> pain<-c(1,2,3,2,3)
> pain.f<-factor(pain,levels = 1:3,labels = c("mild","medium","severe"))
> patients<-data.frame(ID,sex,age,pain.f)
> patients
  ID    sex age pain.f
1  1   male  25   mild
2  2 female  34 medium
3  3   male  38 severe
4  4 female  28 medium
5  5   male  52 severe

7) Data judgment and conversion: Use is.data type() to judge data, and use as.data type() to convert data.

2. Get data

1) Get built-in datasets: It contains nearly 100 datasets.

> data(package="datasets")

Built-in dataset information

2) Simulate data with a specific distribution

> r1 <- rnorm(n = 100, mean = 0, sd = 1) #服从正态分布的随机数
> r2 <- runif(n = 10000, min = 0, max = 100) #服从均匀分布的随机数
> r3 <- rbinom(n = 80, size = 100, prob = 0.1) #服从二项分布的随机数
> r4 <- rpois(n = 50, lambda = 1) #服从泊松分布的随机数

3) Get data in different file formats

1) txt and csv formats

> patient.data<-read.table("patients.txt",header = TRUE)
> patient.data
  ID    sex age pain.f
1  1   male  25   mild
2  2 female  34 severe
3  3   male  38 medium
4  4 female  28 medium
5  5   male  52 severe
> patient.data1<-read.csv("patients.csv",header = TRUE)
> patient.data1
  ID    sex age pain.f
1  1   male  25   mild
2  2 female  34 severe
3  3   male  38 medium
4  4 female  28 medium
5  5   male  52 severe

2) xls and xlsx formats: with the help of third-party packages (openxlsx, readxl and gdata)

> library(openxlsx)
> patient.data2<-read.xlsx("patients.xlsx",sheet=1)
> patient.data2
  ID    sex age pain.f
1  1   male  25   mild
2  2 female  34 severe
3  3   male  38 medium
4  4 female  28 medium
5  5   male  52 severe

2) Formats generated by other software SAS and Stata

> library(foreign)
> patients.data <- read.spss("patients.sav", to.data.frame = TRUE)
> patients.data
  ID    sex age pain.f
1  1 male    25 mild  
2  2 female  34 severe
3  3 male    38 medium
4  4 female  28 medium
5  5 male    52 severe
> View(patients.data)

4) Export data

write.csv(patient.data,file = "patient_data.csv")
save(patient.data,file = "patient_data.rdata") #保存为R数据文件
load("patient_data.rdata")

5) Use the rio package to import and export data

> library(rio)
> data("infert")
> str(infert)
'data.frame':	248 obs. of  8 variables:
 $ education     : Factor w/ 3 levels "0-5yrs","6-11yrs",..: 1 1 1 1 2 2 2 2 2 2 ...
 $ age           : num  26 42 39 34 35 36 23 32 21 28 ...
 $ parity        : num  6 1 6 4 3 4 1 2 1 2 ...
 $ induced       : num  1 1 2 2 1 2 0 0 0 0 ...
 $ case          : num  1 1 1 1 1 1 1 1 1 1 ...
 $ spontaneous   : num  2 0 0 0 1 1 0 0 1 0 ...
 $ stratum       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ pooled.stratum: num  3 1 4 2 32 36 6 22 5 19 ...
> export(infert, "infert.csv")
> convert("infert.csv", "infert.sav")
> infert.data <- import("infert.sav")
> infert.data
    education age parity induced case spontaneous stratum pooled.stratum
1      0-5yrs  26      6       1    1           2       1              3
2      0-5yrs  42      1       1    1           0       2              1
3      0-5yrs  39      6       2    1           0       3              4

Guess you like

Origin blog.csdn.net/qq_42804713/article/details/124269023