R language (3) -- data structure

data structure

vector- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

*Vectors are not found in other programming languages, the essence of R!

Similar to the concept of sets in mathematics, consisting of one or more elements

One-dimensional array for storing numeric, character or logical data

The elements in the vector can only be of one data type and cannot be mixed

create

c() # can be understood as concatenate/collect/combine

e.g 

(1)  x<- c(1,2,3,4,5)

(2) c(1:100) #Build arithmetic sequence 1

       seq (from= 1, to = 100, by = 2) #Build arithmetic sequence 2, the arithmetic difference is 2

       seq (from= 1, to = 100, length.out =10) #Only output 10 values, so the arithmetic difference becomes larger

(3) x[x>3] Take out the value of x>3

(4) rep(x,each = 2, time=5) #Repeatedly print each element of x twice, five times

     rep(x,c(2,3,5,1,3)) #Specify the number of repetitions for each element in x

index 

***Indexes in R start from 1 instead of 0***

x[1] #the first element of x

x[c(4:18)] #x4~18 elements

x[c(11,11,5,90,2)] #The element at the specified position

%in% Whether the element is in the vector, no need to loop

  e.g k<- z%in%c("one", "two")

names() #named vector

e.g names(y) <- c("one", "two","three")

Revise

x[1] <- 1 #Modify x with index 1

append( x = vector name, values ​​= assignment, after= xxx) # insert an element after xxx

rm(x) #delete vector

y <- y[-c(1:3)] #delete element

operation

x+1 #x each element +1

x+y #Add the elements corresponding to the index, there must be a multiple relationship between the two vectors

* #multiply

** #exponentiation

%% # remainder

%/% # divisible

 Common functions

abs(x) # Find the absolute value

sqrt(x) # Find the square root

log(16,base = 2) #take 2 as the base

log10(x) #Find the logarithm with base 10

exp(x) #calculate exponent

ceiling(x) #return the largest integer not less than x

floor(x) #returns the largest integer not greater than x

trunc(x) # return the integer part

round(x, num)#reserve xnum digits=

sin(x) 

cos(x)

sum(x) # find the sum

max(x) min(x) # Find the most value

range(x) # Find the range of x

mean(x) # Find the average value of x

var(x) #return the variance of x

sd(x) #return x standard deviation

prod(x) # Find the product of x multiplication

median(x) # Find the median of x

quantile(x) #calculate quantile

which.max(x) #Returns the maximum index value of the x element

cut(num,c(seq(0,100,10))) #Frequency statistics

Matrix- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

*The matrix is ​​two-dimensional and needs to have rows and columns

*The data types in the matrix need to be consistent

m<- matrix(1:20, 4, 5, byrow = T) #Build a matrix with four rows and five columns, arrange the matrix by row

#name the rows and columns

rnames <- c("R1", "R2", "R3", "R4", "R5") #name the row

cnames <- c("C1", "C2", "C3", "C4", "C5") #name the columns

dimnames(m) <- list(rnames, cnames)

dim(x) <- c(4,5) #Add dimension to vector to build matrix

index

m[1,c(2,3,4)] #The first row of the m matrix, the elements of the second, third and fourth columns

m[ ,2] #access the second column

m[-1,2] #Remove the first row and access the second column

m["R1","C2"] # can also be accessed by name

operation

Single matrix:

Do the same for each element

Double matrix:

Four arithmetic operations: consistent row and column

n*t #Matrix inner product

n %*%t #matrix outer product

diag(m) # find the diagonal

t(m) #Matrix transpose, exchange rows and columns

# Compute the sum

colSums(m) 

rowSums(m) 

# Calculate the average

colMeans(m)

rowMeans(m)

Array- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

* In fact, it is a multidimensional matrix

create

dim1 <- c("A1","A2" )

dim2 <- c("B1","B2","B3" )

dim3 <- c("C1", "C2", "C3", "C4" )

z <- array(1:24, c(2,3,4),dimnames = list(dim1, dim2, dim3))

list- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

*In other programming languages, R is generally equivalent to an array, but in the R language, a list is the most complex data structure in R

*A list is an ordered combination of some objects. A list can store several vectors, matrices, data frames, and even combinations of other lists

Compare with a vector:

1. Similar to vector in mode, they are all one-dimensional data sets

2. The vector can only store one data type, and the objects in the list can be any data structure in R, even the list itself

create

# Create a list here to play

a <- 1:20

b <- matrix(1:20, 4)

c <- mtcars

d <- "this is a test list"

mlist <- list(name1 = a,name2 = b,name3 = c,name4 = d) #Note that the name is optional

access

mlist[1] #The index is similar to a vector, a subset of the output list, or a list

mlist[[1]] #Two square brackets output the data type of the element itself

mlist[c(1,4)] #When accessing multiple elements, remember to put them in the vector

mlist $ name1 #Access through $+ name, very efficient

 Revise

Direct assignment after access

delete

1. mlist <- mlist[-5] # negative index, assign to the original list

2. mlist[[5]] <- NULL #Assign the corresponding value to NULL

Dataframe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

* is a tabular data structure designed to simulate datasets

*The data set is usually a rectangular array composed of data, the row represents the observation, and the column represents the variable. Different industries have different names for the row and column of the data set

* is actually a list. The elements in the list are vectors, which form the columns of the database, and each column must have the same length, so the data frame is a rectangular structure, and the columns of the data frame must be named

Matrix and Database

1. The shape of the data frame is very similar to the matrix

2. A database is a list of comparison rules

3. The matrices must be of the same data type

4. Each column of the data frame must be of the same type, and each row can be different

create

state <- data.frame(state.name, state.abb, state.reigon, state.x77) #store each data as a vector and merge it with data.frame

access

state[c(2,4)] #output the second and fourth columns

state[,"state.abb"]

state["Alabama",] #index by name

state$Murder # $ symbol index

e.g

lm(weight ~height ,data = women) #Linear regression use cases

#attach loads the data frame into the search path

e.g

attach(mtcars) #In this way, you can directly type the column name without $

detach(mtcars) #Cancel loading after using the data

Factor - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Role: classification, calculation frequency and frequency

Variable classification:

Nominal type: such as city name, province

Ordered type: good, better, best, mtcars$cyl (the number of cylinders in the car), state.division, state.region

Continuous: any value in a range such as height, age, growth rate

In R, nominal variables and ordinal variables are called factors, factor. The possible values ​​of these categorical variables become a level (level), such as good, better, best are all a level, and the vector formed by these level values ​​is called a factor

definition

f <- factor(c("red","red", "blue", "green", "blue", "blue")) 

week <- factor (c("Mon", "Fri", "Thu","Wed", "Mon", "Fri","Sun", ordered = T, levels = c("Mon","Tue" , "Wed", "Thu", "Fri", "Sat", "Sun")) #Give the level of the factor, so that the output has order

fcyl <- factor(mtcars$cyl) #Define a variable as a factor

Guess you like

Origin blog.csdn.net/Scabbards_/article/details/130263701