R's vector types and related functions

R's vector types and related functions

Data structures are the way computers store and organize data. A data structure is a collection of data elements that have one or more specific relationships with each other.

Data types in R

  1. Numerical, the numerical value can be used for direct settlement, addition, subtraction, multiplication and division:
  2. String type, can be concatenated, converted, extracted, etc.:
  3. Logical, true or false;
  4. Date type, etc.;

Common data structures: vectors, scalars, lists, arrays, multidimensional arrays

Special data structures:

  1. hashing in perl
  2. dictionaries in python
  3. pointers in C language

R object

Object: object, which refers to anything that can be assigned to a variable, including constants, data structures, functions, and even graphics. Objects have a schema that describes how the object is stored, and a class.

vector

Vector, vector, is the most important concept in R, which is the basis of other data structures. The concept of vector in R is different from the vector in mathematics. It is similar to the concept of set in mathematics, which is composed of one or more elements.
Vectors are actually one-dimensional arrays used to store numeric, character, or logical data.

Use function c to create vectors. c stands for concatenate connection, which can also be understood as collecting collect, or merging combine

<- The shortcut key for the assignment sign is Alt and - sign, and the string must be quoted

> x <- c(1,4,5,2)
> x
[1] 1 4 5 2
> print(x)
[1] 1 4 5 2
> y <- c("one","two","three")
> y
[1] "one"   "two"   "three"
> print
> print(y)
[1] "one"   "two"   "three"
#逻辑型数据
> z <- c(T,T,F)
> z

You can use some shortcuts to build vectors, such as : can be used to build arithmetic progressions

> c(1:100)
  [1]   1   2   3   4   5   6   7   8
  [9]   9  10  11  12  13  14  15  16
 [17]  17  18  19  20  21  22  23  24
 [25]  25  26  27  28  29  30  31  32
 [33]  33  34  35  36  37  38  39  40
 [41]  41  42  43  44  45  46  47  48
 [49]  49  50  51  52  53  54  55  56
 [57]  57  58  59  60  61  62  63  64
 [65]  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80
 [81]  81  82  83  84  85  86  87  88
 [89]  89  90  91  92  93  94  95  96
 [97]  97  98  99 100

You can use the seq function to adjust the arithmetic difference value, from is the starting value, to is the ending value, by sets the arithmetic difference value, and the arithmetic difference value is 1 by default.

> seq(from=1,to=100,by=3)
 [1]   1   4   7  10  13  16  19  22  25
[10]  28  31  34  37  40  43  46  49  52
[19]  55  58  61  64  67  70  73  76  79
[28]  82  85  88  91  94  97 100

length.out is used to control the number of output elements, and the value of the arithmetic difference becomes larger accordingly.

> seq(from=1,to=100,length.out=10)
 [1]   1  12  23  34  45  56  67  78  89
[10] 100

The rep function is used to repeat the sequence. The first parameter is the repeated content, which can be a scalar or vector, and times is the number of repetitions, which can be omitted

> ?rep
> rep(23,5)
[1] 23 23 23 23 23
> x <- c(1,2,3,4,5)
> rep(x,5)
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3
[19] 4 5 1 2 3 4 5

The each parameter is the number of times to repeat for each element

> rep(x,each=5)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4
[19] 4 4 5 5 5 5 5

Each and times are used in combination, the number of repetitions is their product

> rep(x,each=5,times=2)
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4
[19] 4 4 5 5 5 5 5 1 1 1 1 1 2 2 2 2 2 3
[37] 3 3 3 3 4 4 4 4 4 5 5 5 5 5

Vectors can only be of the same type, and cannot be mixed. Only the same type can be used for calculation. You can use the mode() or typeof() function to check the vector type.

> z <- c(T,T,F)
> mode(z)
[1] "logical"
> typeof(z)
[1] "logical"

If there is only one number in the vector, you can use the c function to create the vector and assign it directly

> a = 2
> b =3
> c ="hello"
> c
[1] "hello"
> b
[1] 3
> a
[1] 2

Vectors are the biggest difference between R and other programming languages. There are no vectors in other programming languages. The most basic data structure in R is a set, not a scalar, which is called vectorized programming. The reason why R uses vectorized programming is because R is statistical software, and its role is statistics. Vectorized programming has many benefits, such as avoiding the use of loops

Each element in the X set is doubled and added to the value corresponding to the y set

> x <- c(1,2,3,4,5)
> y <- c(6,7,8,9,10)
> x*2+y
[1]  8 11 14 17 20

Take out the value of x>3 from x

> x <- c(1,2,3,4,5)
> x[x>3]
[1] 4 5

Use a vector in the rep function to individually control the number of repetitions for each element

> x <- c(1,2,3,4,5)
> rep(x,c(2,7,2,3,5))
 [1] 1 1 2 2 2 2 2 2 2 3 3 4 4 4 5 5 5 5
[19] 5

vector index

  1. positive (negative) integer index;
  2. logical vector index;
  3. name index;

Access the elements in the vector, the elements in R start from 1 instead of 0

> x <- c(1:50) #为x赋值等差数列1到50
> x
 [1]  1  2  3  4  5  6  7  8  9 10
[11] 11 12 13 14 15 16 17 18 19 20
[21] 21 22 23 24 25 26 27 28 29 30
[31] 31 32 33 34 35 36 37 38 39 40
[41] 41 42 43 44 45 46 47 48 49 50
> length(x)  #输出元素个数
[1] 50
> x[1]   #输出第一个元素值
[1] 1
> x[-19]   #不输出第19个元素
 [1]  1  2  3  4  5  6  7  8  9 10
[11] 11 12 13 14 15 16 17 18 20 21
[21] 22 23 24 25 26 27 28 29 30 31
[31] 32 33 34 35 36 37 38 39 40 41
[41] 42 43 44 45 46 47 48 49 50

Indexes can also use vectors to access multiple elements at once

> x[c(5:20)]
 [1]  5  6  7  8  9 10 11 12 13 14
[11] 15 16 17 18 19 20
> x[c(1,25,30,35)]
[1]  1 25 30 35
> x[c(20,20,5,5,50,50,12,12)]#可以无序,也可以多次访问同一个元素
[1] 20 20  5  5 50 50 12 12

The index can only be all positive or negative numbers, not both positive and negative numbers

> x[-2,5,6]
Error in x[-2, 5, 6] : incorrect number of dimensions

Use a logical vector to index the vector. If the logical value is true, it will output, and if it is false, it will not output

> y <- c(1:10)
> y
 [1]  1  2  3  4  5  6  7  8  9 10
> y[c(T,T,F,T,F,T,F,F,T,T)]
[1]  1  2  4  6  9 10

The number of logical values ​​does not have to be equal to the number of elements, and can also be used repeatedly

> y <- c(1:10)
> y
 [1]  1  2  3  4  5  6  7  8  9 10
> y[c(T)]  #循环输出
 [1]  1  2  3  4  5  6  7  8  9 10
 > y[c(F)]  #循环不输出
integer(0)
> y[c(T,F)]  #循环输出和不输出
[1] 1 3 5 7 9
> y[c(T,F,F)]  # 循环输出,不输出,不输出
[1]  1  4  7 10
> y[c(T,T,F,T,F,T,F,F,T,T,T)] #产生缺失值
[1]  1  2  4  6  9 10 NA

Not only logical values ​​can be given in the index, but also judgment expressions can be given directly

> y <- c(1:10)
> y
 [1]  1  2  3  4  5  6  7  8  9 10
> y[y>5]
[1]  6  7  8  9 10
> y[y>5 & y<9]
[1] 6 7 8

In a string, %in% indicates whether the element is in the vector

> z <- c("one","two","treen","four","five")
> z
[1] "one"   "two"   "treen"
[4] "four"  "five" 
> "one" %in% z
[1] TRUE

%in% can be added to the index

> z[z %in% c("one","two")]
[1] "one" "two"
> z %in% c("one","two")
[1]  TRUE  TRUE FALSE FALSE FALSE
> k <- z %in% c("one","two")
> z[k]
[1] "one" "two"

You can use the element name to access the vector. You can use the names element to add a name to each element of the vector. The output vector y has two lines. One line is the element name, called the name attribute, and one line is the vector element value, which we call value.

> y
 [1]  1  2  3  4  5  6  7  8  9 10
> names(y)
NULL
> names(y) <- c("one","two","treen","four","five","six","seven","eight","nine","ten")
> y
  one   two treen  four  five   six seven eight  nine   ten 
    1     2     3     4     5     6     7     8     9    10 

access by name

> names(y)
 [1] "one"   "two"   "treen" "four"  "five"  "six"   "seven" "eight"
 [9] "nine"  "ten"  
> y["two"]
two 
  2 

modify vector

add element

> x <- c(1:50)
> x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
[47] 47 48 49 50
> x[51] <- 51
> x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
[47] 47 48 49 50 51
> v <- 1:3
> v[c(4,5,6)] <- c(4,5,6)
> v
[1] 1 2 3 4 5 6

insert and delete elements

> append(x=v,values = 99,after = 5)
[1]  1  2  3  4  5 99  6
> rm(v)
> v
Error: object 'v' not found

Delete an element in a vector, using a negative integer index

> y
  one   two treen  four  five   six seven eight  nine   ten 
    1     2     3     4     5     6     7     8     9    10 
> y[-c(1:3)]
 four  five   six seven eight  nine   ten 
    4     5     6     7     8     9    10 
> y <- y[-c(1:3)]
> y
 four  five   six seven eight  nine   ten 
    4     5     6     7     8     9    10 

vector operation

Vector operations operate on each element

> x <- 1:10
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> x+1
 [1]  2  3  4  5  6  7  8  9 10 11
> x-3
 [1] -2 -1  0  1  2  3  4  5  6  7
> x <- x+1
> x
 [1]  2  3  4  5  6  7  8  9 10 11
> x
 [1]  2  3  4  5  6  7  8  9 10 11
> y <- seq(1,100,length.out=10)
> y
 [1]   1  12  23  34  45  56  67  78  89 100
> x+y
 [1]   3  15  27  39  51  63  75  87  99 111
> x**y  #乘幂运算
 [1]  2.000000e+00  5.314410e+05  7.036874e+13  5.820766e+23
 [5]  1.039456e+35  2.115876e+47  3.213876e+60  2.697216e+74
 [9]  1.000000e+89 1.378061e+104
> x%%y  #求余运算
 [1]  0  3  4  5  6  7  8  9 10 11
> y%/%x  #整除运算
 [1] 0 4 5 6 7 8 8 8 8 9

Loop overlay operation (shorter vectors are recycled)

> z <- c(1,2)
> x
 [1]  2  3  4  5  6  7  8  9 10 11
> x+z
 [1]  3  5  5  7  7  9  9 11 11 13

If the number of vectors does not match, an error will be reported. (The long vector must be a multiple of the short vector before it can be looped)

> z <- 1:3
> x+z
 [1]  3  5  7  6  8 10  9 11 13 12
Warning message:
In x + z : longer object length is not a multiple of shorter object length

The containment operator is used to test whether the element on the left exists on the right

> c(1,2,3) %in% c(1,2,2,4,5,6)
[1]  TRUE  TRUE FALSE
> x
 [1]  2  3  4  5  6  7  8  9 10 11
> y
 [1]   1  12  23  34  45  56  67  78  89 100
> x==y
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Functions for vector calculations

> x <- -5:5
> x
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
> abs(x)         #取绝对值
 [1] 5 4 3 2 1 0 1 2 3 4 5
> sqrt(25)         #计算平方根
[1] 5
> log(16,base =2)  #计算以2为底的对数
[1] 4
> log10(100)  #计算以10为底的对数
[1] 2
> exp(x)   #计算向量中每个元素的指数
 [1] 6.737947e-03 1.831564e-02 4.978707e-02 1.353353e-01
 [5] 3.678794e-01 1.000000e+00 2.718282e+00 7.389056e+00
 [9] 2.008554e+01 5.459815e+01 1.484132e+02
 #ceiling函数返回不小于x的最小整数
 > ceiling(c(-2.3,3.1415))
[1] -2  4
#floor函数返回不大于x的最小整数
> floor(c(-2.3,3.1415))
[1] -3  3
#trunc函数返回整数部分
> trunc(c(-2.3,3.1415))
[1] -2  3
#round函数用于四舍五入,digits用于保留位数
> round(c(-2.3,3.1415))
[1] -2  3
> round(c(-2.3,3.1415),digits = 2)
[1] -2.30  3.14
#signif函数和round函数类似,只不过保留有些位数
> signif(c(-2.3,3.1415),digits = 2)
[1] -2.3  3.1
#三角函数
> x
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
> sin(x)
 [1]  0.9589243  0.7568025 -0.1411200 -0.9092974 -0.8414710
 [6]  0.0000000  0.8414710  0.9092974  0.1411200 -0.7568025
[11] -0.9589243
> cos(x)
 [1]  0.2836622 -0.6536436 -0.9899925 -0.4161468  0.5403023
 [6]  1.0000000  0.5403023 -0.4161468 -0.9899925 -0.6536436
[11]  0.2836622

statistical function

> vec <- 1:50
> sum(vec)     #返回向量求和
[1] 1275
> max(vec)    #返回向量最大值
[1] 50
> min(vec)    #返回向量最小值
[1] 1
> range(vec)  #返回向量范围
[1]  1 50
> mean(vec)  #返回向量平均值
[1] 25.5
> var(vec)  #返回向量方差
[1] 212.5

> round(var(vec),digits = 2)   #round函数设置保留的两位小数
[1] 212.5
> round(sd(vec),digits = 2)#sd函数返回标准差,round函数保留两位小数
[1] 14.58
> prod(vec)        #prod函数返回向量连乘积
[1] 3.041409e+64
> median(vec)    #midine函数计算中位数
[1] 25.5

#quantile()是分位数函数,第N个分位数就表示数据集中有N%的数据小于它。
> quantile(vec,c(0.4,0.5,0.8))    #quantile函数计算分位数
 40%  50%  80% 
20.6 25.5 40.2                

#返回位置索引值
> t <- c(1,4,2,5,9,7,6)
> t
[1] 1 4 2 5 9 7 6
> which.max(t)
[1] 5
> which.min(t)
[1] 1
> which(t==7)
[1] 6
> which(t>5)
[1] 5 6 7

Guess you like

Origin blog.csdn.net/qq_44795788/article/details/125136402