Table of contents
1. Read the coil detector data sample: Detector_sample.csv, output the total number of records
2. Count the missing numbers in the flow column.
3. Eliminate rows containing missing values, and output the number of records after elimination.
5. Calculate the mean, variance, 25%, 50% and 75% quantiles of traffic, speed, and occupancy
6. Sort the data frame by the time column.
8. Draw the histogram of traffic, speed and occupancy
9. Draw flow-velocity scatter diagram.
1. Data frame operation
1. Read the coil detector data sample: Detector_sample.csv, output the total number of records
code:
data<-read.csv("F:\\data\\Detector_sample_update.csv")
View(data)
Output results (limited by space, only the beginning and the end of the screenshot, the middle is omitted):
2. Count the missing numbers in the flow column.
code:
table(is.na(data[3]))
Output result:
That is, the missing number is 3
3. Eliminate rows containing missing values, and output the number of records after elimination.
code:
na_omit_data<-na.omit(data)
print(nrow(na_omit_data))
Output result:
That is, the number of records after elimination is 4323
4. Count the number of redundant records, and remove them if necessary. (Reference function: duplicated())
code:
table(duplicated(na_omit_data))
dupicated_data<-na_omit_data[!duplicated(na_omit_data),]
Output result:
That is, the number of redundant records is 6
5. Calculate the mean, variance, 25%, 50% and 75% quantiles of traffic, speed, and occupancy
code:
summary(dupicated_data)
var(dupicated_data[3])
var(dupicated_data[4])
var(dupicated_data[5])
Output result:
Right now:
The flow mean is 7.51, the variance is 15.77452, the 25th percentile is 4.00, the 50th percentile is 9.00, and the 75th percentile is 11.00.
The velocity mean is 51.23, the variance is 348.3788, the 25th percentile is 38.00, the 50th percentile is 48.00, and the 75th percentile is 66.00.
The occupancy mean is 15.79, the variance is 124.3895, the 25th percentile is 5.00, the 50th percentile is 16.00, and the 75th percentile is 25.00.
2. Date and time processing
2.1 Processing
6. Sort the data frame by the time column.
code:
order_data<-dupicated_data[order(dupicated_data$FDT_TIME),]
View(order_data)
Output result:
7. Extract the hours, minutes, and seconds of the date and time columns, and add them as DAY, MINUTE, and SECOND columns.
code:
library(lubridate)
datatime<-as.POSIXlt(order_data$FDT_TIME)
order_data$HOUR <- c(hour(datatime))
order_data$MINUTE <- c(minute(datatime))
order_data$SECOND <- c(second(datatime))
2.2 Drawing
8. Draw the histogram of traffic, speed and occupancy
code:
Output result:
9. Draw flow-velocity scatter diagram.
the code
Output result :