data structure
vector- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*Vectors are not found in other programming languages, the essence of R!
Similar to the concept of sets in mathematics, consisting of one or more elements
One-dimensional array for storing numeric, character or logical data
The elements in the vector can only be of one data type and cannot be mixed
create
c() # can be understood as concatenate/collect/combine
e.g
(1) x<- c(1,2,3,4,5)
(2) c(1:100) #Build arithmetic sequence 1
seq (from= 1, to = 100, by = 2) #Build arithmetic sequence 2, the arithmetic difference is 2
seq (from= 1, to = 100, length.out =10) #Only output 10 values, so the arithmetic difference becomes larger
(3) x[x>3] Take out the value of x>3
(4) rep(x,each = 2, time=5) #Repeatedly print each element of x twice, five times
rep(x,c(2,3,5,1,3)) #Specify the number of repetitions for each element in x
index
***Indexes in R start from 1 instead of 0***
x[1] #the first element of x
x[c(4:18)] #x4~18 elements
x[c(11,11,5,90,2)] #The element at the specified position
%in% Whether the element is in the vector, no need to loop
e.g k<- z%in%c("one", "two")
names() #named vector
e.g names(y) <- c("one", "two","three")
Revise
x[1] <- 1 #Modify x with index 1
append( x = vector name, values = assignment, after= xxx) # insert an element after xxx
rm(x) #delete vector
y <- y[-c(1:3)] #delete element
operation
x+1 #x each element +1
x+y #Add the elements corresponding to the index, there must be a multiple relationship between the two vectors
* #multiply
** #exponentiation
%% # remainder
%/% # divisible
Common functions
abs(x) # Find the absolute value
sqrt(x) # Find the square root
log(16,base = 2) #take 2 as the base
log10(x) #Find the logarithm with base 10
exp(x) #calculate exponent
ceiling(x) #return the largest integer not less than x
floor(x) #returns the largest integer not greater than x
trunc(x) # return the integer part
round(x, num)#reserve xnum digits=
sin(x)
cos(x)
sum(x) # find the sum
max(x) min(x) # Find the most value
range(x) # Find the range of x
mean(x) # Find the average value of x
var(x) #return the variance of x
sd(x) #return x standard deviation
prod(x) # Find the product of x multiplication
median(x) # Find the median of x
quantile(x) #calculate quantile
which.max(x) #Returns the maximum index value of the x element
cut(num,c(seq(0,100,10))) #Frequency statistics
Matrix- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*The matrix is two-dimensional and needs to have rows and columns
*The data types in the matrix need to be consistent
m<- matrix(1:20, 4, 5, byrow = T) #Build a matrix with four rows and five columns, arrange the matrix by row
#name the rows and columns
rnames <- c("R1", "R2", "R3", "R4", "R5") #name the row
cnames <- c("C1", "C2", "C3", "C4", "C5") #name the columns
dimnames(m) <- list(rnames, cnames)
dim(x) <- c(4,5) #Add dimension to vector to build matrix
index
m[1,c(2,3,4)] #The first row of the m matrix, the elements of the second, third and fourth columns
m[ ,2] #access the second column
m[-1,2] #Remove the first row and access the second column
m["R1","C2"] # can also be accessed by name
operation
Single matrix:
Do the same for each element
Double matrix:
Four arithmetic operations: consistent row and column
n*t #Matrix inner product
n %*%t #matrix outer product
diag(m) # find the diagonal
t(m) #Matrix transpose, exchange rows and columns
# Compute the sum
colSums(m)
rowSums(m)
# Calculate the average
colMeans(m)
rowMeans(m)
Array- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
* In fact, it is a multidimensional matrix
create
dim1 <- c("A1","A2" )
dim2 <- c("B1","B2","B3" )
dim3 <- c("C1", "C2", "C3", "C4" )
z <- array(1:24, c(2,3,4),dimnames = list(dim1, dim2, dim3))
list- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*In other programming languages, R is generally equivalent to an array, but in the R language, a list is the most complex data structure in R
*A list is an ordered combination of some objects. A list can store several vectors, matrices, data frames, and even combinations of other lists
Compare with a vector:
1. Similar to vector in mode, they are all one-dimensional data sets
2. The vector can only store one data type, and the objects in the list can be any data structure in R, even the list itself
create
# Create a list here to play
a <- 1:20
b <- matrix(1:20, 4)
c <- mtcars
d <- "this is a test list"
mlist <- list(name1 = a,name2 = b,name3 = c,name4 = d) #Note that the name is optional
access
mlist[1] #The index is similar to a vector, a subset of the output list, or a list
mlist[[1]] #Two square brackets output the data type of the element itself
mlist[c(1,4)] #When accessing multiple elements, remember to put them in the vector
mlist $ name1 #Access through $+ name, very efficient
Revise
Direct assignment after access
delete
1. mlist <- mlist[-5] # negative index, assign to the original list
2. mlist[[5]] <- NULL #Assign the corresponding value to NULL
Dataframe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
* is a tabular data structure designed to simulate datasets
*The data set is usually a rectangular array composed of data, the row represents the observation, and the column represents the variable. Different industries have different names for the row and column of the data set
* is actually a list. The elements in the list are vectors, which form the columns of the database, and each column must have the same length, so the data frame is a rectangular structure, and the columns of the data frame must be named
Matrix and Database
1. The shape of the data frame is very similar to the matrix
2. A database is a list of comparison rules
3. The matrices must be of the same data type
4. Each column of the data frame must be of the same type, and each row can be different
create
state <- data.frame(state.name, state.abb, state.reigon, state.x77) #store each data as a vector and merge it with data.frame
access
state[c(2,4)] #output the second and fourth columns
state[,"state.abb"]
state["Alabama",] #index by name
state$Murder # $ symbol index
e.g
lm(weight ~height ,data = women) #Linear regression use cases
#attach loads the data frame into the search path
e.g
attach(mtcars) #In this way, you can directly type the column name without $
detach(mtcars) #Cancel loading after using the data
Factor - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Role: classification, calculation frequency and frequency
Variable classification:
Nominal type: such as city name, province
Ordered type: good, better, best, mtcars$cyl (the number of cylinders in the car), state.division, state.region
Continuous: any value in a range such as height, age, growth rate
In R, nominal variables and ordinal variables are called factors, factor. The possible values of these categorical variables become a level (level), such as good, better, best are all a level, and the vector formed by these level values is called a factor
definition
f <- factor(c("red","red", "blue", "green", "blue", "blue"))
week <- factor (c("Mon", "Fri", "Thu","Wed", "Mon", "Fri","Sun", ordered = T, levels = c("Mon","Tue" , "Wed", "Thu", "Fri", "Sat", "Sun")) #Give the level of the factor, so that the output has order
fcyl <- factor(mtcars$cyl) #Define a variable as a factor