R language matrix, list, data frame
matrix
A matrix is a collection of complex or real numbers arranged in a rectangular array. Vectors are one-dimensional, while matrices are two-dimensional and require rows and columns.
In R software, a matrix is a vector with dimensions. The matrix elements here can be numeric, character or logical, but each element must have the same mode, which is consistent with the vector.
> m <- matrix(1:20,nrow = 4,ncol = 5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
By default, the value is distributed by column, and the byrow parameter can be used to set the distribution by row or by column
> m <- matrix(1:20,4)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> m <- matrix(1:20,4,byrow=T)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
> m <- matrix(1:20,4,byrow=F)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
name the rows and columns of the matrix
> rnameS <- c("R1","R2","R3","R4")
> rnameS
[1] "R1" "R2" "R3" "R4"
> cnames <- c("C1","C2","C3","C4","C5")
> cnames
[1] "C1" "C2" "C3" "C4" "C5"
> rnameS
[1] "R1" "R2" "R3" "R4"
> dimnames(m) <- list(rnameS,cnames)
> m
C1 C2 C3 C4 C5
R1 1 5 9 13 17
R2 2 6 10 14 18
R3 3 7 11 15 19
R4 4 8 12 16 20
The dim function is the abbreviation of Dimensions, and the dim function can display the dimension of the vector
> x
[1] 1 2 3 4 5 6 7 8 9 10
[11] 11 12 13 14 15 16 17 18 19 20
> dim(x)
NULL
> dim(x) <- c(4,5)
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
array
> x <- 1:20
> dim(x) <- c(2,2,5) #多维数组,长宽高为2,2,5的长方体
> x
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
, , 3
[,1] [,2]
[1,] 9 11
[2,] 10 12
, , 4
[,1] [,2]
[1,] 13 15
[2,] 14 16
, , 5
[,1] [,2]
[1,] 17 19
[2,] 18 20
The array function creates an array
> dim1 <- c("A1","A2")
> dim2 <- c("B1","B2","B3","B4")
> dim2 <- c("B1","B2","B3")
> dim2
[1] "B1" "B2" "B3"
> dim3 <- c("C1","c2","c3","c4")
> z <- array(1:24,c(2,3,4),dimnames = list(dim1,dim2,dim3))
> z
, , C1
B1 B2 B3
A1 1 3 5
A2 2 4 6
, , c2
B1 B2 B3
A1 7 9 11
A2 8 10 12
, , c3
B1 B2 B3
A1 13 15 17
A2 14 16 18
, , c4
B1 B2 B3
A1 19 21 23
A2 20 22 24
the index of the matrix
> m <- matrix(1:20,4,5,byrow = T)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
> m[1,2] #取第1行第2列的值
[1] 2
> m[1,c(2,3,4)] #取第1行第2,3,4列的值
[1] 2 3 4
> m[c(2:4),c(2,3)] #取第2-4行,第2,3列
[,1] [,2]
[1,] 7 8
[2,] 12 13
[3,] 17 18
> m[2,] #取第2行全部值
[1] 6 7 8 9 10
> m[,2] #取第2列全部值
[1] 2 7 12 17
> m[2] #取第2行第1个值
[1] 6
> m[-1,2] #去除第1 行之后的第2列的值
[1] 7 12 17
Matrix rows and columns have name attributes, you can access rows and columns by name
> rnames <- c("R1","R2","R3","R4")
> rnames
[1] "R1" "R2" "R3" "R4"
> cnames <- c("C1","C2","C3","C4","C5")
> cnames
[1] "C1" "C2" "C3" "C4" "C5"
> dimnames(m)=list(rnames,cnames)
> m
C1 C2 C3 C4 C5
R1 1 2 3 4 5
R2 6 7 8 9 10
R3 11 12 13 14 15
R4 16 17 18 19 20
> m["R1","C2"]
[1] 2
The colSums() function calculates the sum of each column, and the rowSums() function calculates the sum of each row
> n <- matrix(1:9,3,3) #创建3行3列的矩阵
> t <- matrix(2:10,3,3) #创建3行3列的矩阵
> n
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> t
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 4 7 10
> n+t #矩阵相加
[,1] [,2] [,3]
[1,] 3 9 15
[2,] 5 11 17
[3,] 7 13 19
> n*t #矩阵的乘积
[,1] [,2] [,3]
[1,] 2 20 56
[2,] 6 30 72
[3,] 12 42 90
> n%*%t #矩阵的外积
[,1] [,2] [,3]
[1,] 42 78 114
[2,] 51 96 141
[3,] 60 114 168
the list
As the name suggests, a list is a collection used to store a lot of content. In other programming languages, a list is generally equivalent to an array, but in R language, a list is the most complex data structure in R, and it is also a very important one. data structure.
A list is an ordered collection of objects. A list can store several vectors, matrices, data frames, or even combinations of other lists.
Vectors and Lists
1. Similar to vectors in mode, they are all one-dimensional data sets.
2. The vector can only store one data type, and the objects in the list can be any data structure in the list, even the list itself.
> a <- 1:20 #向量
> b <- matrix(1:20,4) #矩阵
> c <- mtcars #mtcars为内置数据集表示32辆汽车在11个指标上的数据
> d <- "this is a test list"
> a;b;c;d
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[17] 17 18 19 20
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
[1] "this is a test list"
> mlist <- list(a,b,c,d) #存入列表
>mlist <- list(first=a,second=b,third=c,forth=d) #为每个对象添加名称
> mlist
$first
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$second
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
$third
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
$forth
[1] "this is a test list"
list access
> mlist[1] #访问列表第一个元素
$first
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[1] 2 3 4 5 6 7 8 9 10 11
> mlist[c(1,4)] #访问多个元素,
$first
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$forth
[1] "this is a test list"
##通过名称访问 ,state.center #美国50个州中心的经度和纬度
> mlist["first"]
$first
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> mlist[c("first","forth")]
$first
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$forth
[1] "this is a test list"
#列表名$元素名称
> mlist$first
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
The difference between double brackets and single brackets for list access elements
> mlist[1]
$first
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> mlist[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> class (mlist[1])
[1] "list"
> class(mlist[[1]])
[1] "integer"
assign value to list
> y
[1] 1 12 23 34 45 56 67 78 89 100
> mlist[[5]] <- y
removes an element from a list
>mlist[-5]
> mlist <- mlist[-5]
#或者
> mlist[[5]] <- NULL
data frame
A data frame is a tabular data structure. Data frames are designed to simulate data sets, consistent with the concept of data sets in other statistical software such as SAS or SPSS.
A dataset is usually a rectangular array of data, with rows representing observations and columns representing variables. Different industries have different names for the rows and columns of a dataset.
Data frame characteristics
A data frame is actually a list. The elements in the list are vectors, and these vectors form the columns of the data frame, each column must have the same length, so the data frame is a rectangular structure, and the columns of the data frame must be named.
Matrices and Data Frames
- Data frames are matrix-like in shape;
- A data frame is a list of comparison rules
- matrices must be of the same data type
- Each column of the data frame must be of the same type, and each row can be different.
The use of data frames is similar to lists
> women
height weight
1 58 115
2 59 117
3 60 120
4 61 123
5 62 126
6 63 129
7 64 132
8 65 135
9 66 139
10 67 142
11 68 146
12 69 150
13 70 154
14 71 159
15 72 164
> plot(women$height,women$weight) #绘制女性身高体重的散点图
The lm () function is a linear model function for linear regression analysis
> lm(weight ~height,data = women)
Call:
lm(formula = weight ~ height, data = women)
Coefficients:
(Intercept) height
-87.52 3.45
After the attach() function loads the data frame into the R search directory, you can directly enter the name of the vector
> attach(women)
> height
[1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
> weight
[1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
#使用完之后,可以使用detach()函数取消加载
> detach(women)
The with() function can also output elements of the function name
> with(women,{height})
[1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
> with(women,{weight})
[1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
> with(women,{sum(weight)})
[1] 2051