R language homework: basic data analysis of taxi data, time processing, etc.

Table of contents

1. Data frame operation

1. Read the coil detector data sample: Detector_sample.csv, output the total number of records

2. Count the missing numbers in the flow column.

3. Eliminate rows containing missing values, and output the number of records after elimination.

4. Count the number of redundant records, and remove them if necessary. (Reference function: duplicated())

5. Calculate the mean, variance, 25%, 50% and 75% quantiles of traffic, speed, and occupancy

2. Date and time processing

2.1 Processing

6. Sort the data frame by the time column.

7. Extract the hours, minutes, and seconds of the date and time columns, and add them as DAY, MINUTE, and SECOND columns.

2.2 Drawing

8. Draw the histogram of traffic, speed and occupancy

9. Draw flow-velocity scatter diagram.


1. Data frame operation

1. Read the coil detector data sample: Detector_sample.csv, output the total number of records

code:

data<-read.csv("F:\\data\\Detector_sample_update.csv")

View(data)

Output results (limited by space, only the beginning and the end of the screenshot, the middle is omitted):

2. Count the missing numbers in the flow column.

code:

table(is.na(data[3]))

Output result:

That is, the missing number is 3

 

3. Eliminate rows containing missing values, and output the number of records after elimination.

code:

na_omit_data<-na.omit(data)

print(nrow(na_omit_data))

Output result:

That is, the number of records after elimination is 4323

4. Count the number of redundant records, and remove them if necessary. (Reference function: duplicated())

code:

table(duplicated(na_omit_data))

dupicated_data<-na_omit_data[!duplicated(na_omit_data),]

Output result:

 That is, the number of redundant records is 6

5. Calculate the mean, variance, 25%, 50% and 75% quantiles of traffic, speed, and occupancy

code:

summary(dupicated_data)

var(dupicated_data[3])

var(dupicated_data[4])

var(dupicated_data[5])

Output result:

Right now:

The flow mean is 7.51, the variance is 15.77452, the 25th percentile is 4.00, the 50th percentile is 9.00, and the 75th percentile is 11.00.

The velocity mean is 51.23, the variance is 348.3788, the 25th percentile is 38.00, the 50th percentile is 48.00, and the 75th percentile is 66.00.

The occupancy mean is 15.79, the variance is 124.3895, the 25th percentile is 5.00, the 50th percentile is 16.00, and the 75th percentile is 25.00.

2. Date and time processing

2.1 Processing

6. Sort the data frame by the time column.

code:

order_data<-dupicated_data[order(dupicated_data$FDT_TIME),]

View(order_data)

Output result:

7. Extract the hours, minutes, and seconds of the date and time columns, and add them as DAY, MINUTE, and SECOND columns.

code:

library(lubridate)

datatime<-as.POSIXlt(order_data$FDT_TIME)

order_data$HOUR <- c(hour(datatime))

order_data$MINUTE <- c(minute(datatime))

order_data$SECOND <- c(second(datatime))

2.2  Drawing

8. Draw the histogram of traffic, speed and occupancy

code:

Output result:

9. Draw flow-velocity scatter diagram.

the code

Output result :

Guess you like

Origin blog.csdn.net/qq_52360013/article/details/122011971