《学习R》笔记:科学计算器、检查变量和工作区、向量、矩阵和数组、列表和数据框
一、第二章1765243235 科学计算器
要检查两个数字是否一样,要使用 all.equal() ,不要使用 == ,== 符号仅用于比较两个整型数是否存在相同 。
1
2
3
4
5
6
7
8
|
>
all.equal
(
sqrt
(2)^2,2)
[1]
TRUE
>
all.equal
(
sqrt
(2) ^ 2,3)
[1]
"Mean relative difference: 0.5"
>
isTRUE
(
all.equal
(
sqrt
(2) ^ 2,2))
[1]
TRUE
>
isTRUE
(
all.equal
(
sqrt
(2) ^ 2,3))
[1]
FALSE
|
二、第三章 检查变量和工作区
变量的类:逻辑类(logical)、三个数值的类(numeric、complex、integer)、用于存储文本的字符character、存储类别数据的因子factor,以及较罕见的存储二进制数据的原始值raw
factor因子,存储类别数据
1
2
3
4
5
6
7
8
|
> gender =
factor
(
c
(
"male"
,
"female"
,
"male"
,
"female"
))
> gender
[1] male female male female
Levels: female male
>
levels
(gender)
[1]
"female"
"male"
>
nlevels
(gender)
[1] 2
|
在底层,因子的值被存储为整数,而非字符。可以通过调用 as.integer() 清楚的看到
1
2
|
>
as.integer
(gender)
[1] 2 1 2 1
|
事实证明,采用整数而非字符文本的存储方式,令内存的使用非常高效
1
2
3
4
5
6
7
8
9
10
|
> gender_char =
sample
(
c
(
"female"
,
"male"
),1000,replace =
TRUE
)
> gender_char
......
> gender_fac =
as.factor
(gender_char)
>
#把数据的类型转换为因子型
>
object.size
(gender_char)
#object.size()函数返回对象的内存大小
8160 bytes
>
object.size
(gender_fac)
4560 bytes
|
把因子转换为字符串
1
2
|
>
as.character
(gender)
[1]
"male"
"female"
"male"
"female"
|
改变一个对象的类型(转型casting)
1
2
3
4
5
|
> x =
"123.456"
#使用as*函数改变x的类型
>
as.numeric
(x)
#as(x,"numeric")
[1] 123.456
>
is.numeric
(x)
[1]
FALSE
|
代码 options(digits = n) 设置全局变量确定打印数字的小数点位数。
1
2
3
4
|
>
options
(digits = 10)
> (x =
runif
(5))
[1] 0.040052175522 0.544388080016 0.506369658280
[4] 0.144690239336 0.005838404642
|
runif 函数将生成30个均匀分布于0和1之间的随机数,summary 函数就不同的数据类型提供汇总信息,例如对数值变量:
1
2
3
4
5
6
|
> num =
runif
(30)
>
summary
(num)
Min. 1st Qu. Median Mean
0.001235794 0.199856233 0.475356185 0.475318138
3rd Qu. Max.
0.703412558 0.984893506
|
letters、LETTERS 是两个内置的常数
1
2
3
4
5
6
7
8
|
>
letters
[1]
"a"
"b"
"c"
"d"
"e"
"f"
"g"
"h"
"i"
"j"
"k"
"l"
[13]
"m"
"n"
"o"
"p"
"q"
"r"
"s"
"t"
"u"
"v"
"w"
"x"
[25]
"y"
"z"
>
LETTERS
[1]
"A"
"B"
"C"
"D"
"E"
"F"
"G"
"H"
"I"
"J"
"K"
"L"
[13]
"M"
"N"
"O"
"P"
"Q"
"R"
"S"
"T"
"U"
"V"
"W"
"X"
[25]
"Y"
"Z"
|
sample 函数为抽样函数,它的格式为:sample( x , size= , replace= ) 第三个参数的缺省值是F ,表示进行的是无放回抽样。
对a~e重复随机抽样30次:
1
2
3
4
|
> fac =
factor
(
sample
(
letters
[1:5],size = 30,replace = T))
>
summary
(fac)
a b c d e
4 7 2 5 12
|
1
2
3
4
|
> bool =
sample
(
c
(
TRUE
,
FALSE
,
NA
),30,replace =
TRUE
)
>
summary
(bool)
Mode
FALSE
TRUE
NA
's
logical 10 8 12
|
创建数据框dfr ,这里只显示他的前几行
1
2
3
4
5
6
7
8
9
|
> dfr =
data.frame
(num,fac,bool)
>
head
(dfr)
#默认显示前6行
num fac bool
1 0.34019507235 b
NA
2 0.77415443189 e
TRUE
3 0.02201034524 d
TRUE
4 0.11190012516 e
NA
5 0.18030911358 a
NA
6 0.98489350639 d
TRUE
|
1
2
3
4
5
6
7
8
|
>
summary
(dfr)
num fac bool
Min. :0.001235794 a: 4 Mode :logical
1st Qu.:0.199856233 b: 7
FALSE
:10
Median :0.475356185 c: 2
TRUE
:8
Mean :0.475318138 d: 5
NA
's :12
3rd Qu.:0.703412558 e:12
Max. :0.984893506
|
str 函数能显示对象的结构。对向量来说,它并非很有趣(因为它们太简单了),但 str 对数据框和嵌套列表非常有用:
1
2
3
4
5
6
7
|
>
str
(num)
num [1:30] 0.34 0.774 0.022 0.112 0.18 ...
>
str
(dfr)
'data.frame'
: 30 obs. of 3 variables:
$ num : num 0.34 0.774 0.022 0.112 0.18 ...
$ fac : Factor w/ 5 levels
"a"
,
"b"
,
"c"
,
"d"
,..: 2 5 4 5 1 4 1 4 1 5 ...
$ bool: logi
NA
TRUE
TRUE
NA
NA
TRUE
...
|
每个类都有自己的打印(print)方法,以此控制如何显示到控制台。又是,这种打印模糊了其内部结构,或忽略了一些有用的信息。用unclass函数可绕开这一点,显示变量是如何构建的。例如,对因子调用 unclass 函数会显示它仅是一个整数(integer) 向量,拥有一个叫 levels 的属性:
1
2
3
4
|
unclass
(fac)
[1] 2 1 4 3
attr
(,
"levels"
)
[1]
"cat"
"dog"
"goldfish"
"hamster"
|
attributes 函数能显示当前对象的所有属性列表:
1
2
3
4
5
6
|
>
attributes
(fac)
$levels
[1]
"cat"
"dog"
"goldfish"
"hamster"
$class
[1]
"factor"
|
view 函数会把数据框显示为电子表格。edit 和 fix 与其相似,不过它们允许手动更改数据值。
1
2
3
|
View
(dfr)
#不允许更改
new_dfr =
edit
(dfr)
#更改将保存于new_dfr
fix
(dfr)
#更改将保存于dfr
|
1
|
View
(
head
(dfr,50))
#查看前50行
|
三、第四章 向量、矩阵和数组
数组能存放多维矩形数据。矩阵是二维数组的特例。
有很多创建序列的方法,seq创建的优点是可设置步长。
1
2
|
> (xulie =
seq
(1,15,2))
[1] 1 3 5 7 9 11 13 15
|
length() 函数查询序列的长度:
1
2
|
>
length
(xulie)
[1] 8
|
向量的命名:
1
2
3
4
5
6
7
8
|
>
c
(apple = 1,banana = 2,
"kiwi fruit"
= 3, 4)
apple banana kiwi fruit
1 2 3 4
> x = 1:4
>
names
(x) =
c
(
"apple"
,
"banana"
,
"kiwi fruit"
,
""
)
> x
apple banana kiwi fruit
1 2 3 4
|
数组的创建:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
> three_d_array =
array
(
#三维数组
+ 1:24,
+ dim =
c
(4,3,2),
+ dimnames =
list
(
+
c
(
"one"
,
"two"
,
"three"
,
"four"
),
+
c
(
"ein"
,
"zwei"
,
"drei"
),
+
c
(
"un"
,
"deux"
)
+ )
+ )
> three_d_array
, , un
ein zwei drei
one 1 5 9
two 2 6 10
three 3 7 11
four 4 8 12
, , deux
ein zwei drei
one 13 17 21
two 14 18 22
three 15 19 23
four 16 20 24
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
> (a_matrix =
matrix
(
#创建矩阵
+ 1:12,
+ nrow = 4,byrow = T,
+ dimnames =
list
(
+
c
(
"one"
,
"two"
,
"three"
,
"four"
),
+
c
(
"ein"
,
"zwei"
,
"drei"
)
+ )
+ ))
ein zwei drei
one 1 2 3
two 4 5 6
three 7 8 9
four 10 11 12
|
一些函数:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
> x = (1:5) ^ 2
> x
[1] 1 4 9 16 25
> x[
c
(1,3,5)]
[1] 1 9 25
> x[
c
(-2,-4)]
[1] 1 9 25
> x[
c
(
TRUE
,F,T,F,T)]
[1] 1 9 25
>
names
(x) =
c
(
"one"
,
"four"
,
"nine"
,
"sixteen"
,
"twenty five"
)
> x
one four nine sixteen twenty five
1 4 9 16 25
>
which
(x > 10)
sixteen twenty five
4 5
>
which.min
(x)
one
1
>
which.max
(x)
twenty five
5
>
|
1
2
3
4
5
6
7
8
|
>
rep
(1:5 , 3)
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
>
rep
(1:5 , each = 3)
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
>
rep
(1:5 , times = 1:5)
[1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5
>
rep
(1:5 , length.out = 7)
[1] 1 2 3 4 5 1 2
|
1
2
3
4
|
>
rep.int
(1:5 , 3)
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
>
rep_len
(1:5 , 13)
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3
|
1
2
3
4
5
6
7
8
|
>
dim
(three_d_array)
[1] 4 3 2
>
dim
(a_matrix)
[1] 4 3
>
nrow
(a_matrix)
[1] 4
>
ncol
(a_matrix)
[1] 3
|