I'm new to R. I have a code that reads from a file without headers and selects the very first two elements from each line. Each line determines an airplane route. The first element describes the airport name from where it takesoff and the second element the airport name where it lands.
Structure from one out of the thousands of lines from the file:
LFPO;LFSL;00;AT45;210;LFPO;LFSL;189930747;150907;1815;!!!!;HOP25ZZ;HOP;0;HOP25ZZ-LFPO-LFSL-20150907180500;N;0;;;245346;;;150907;1805;0;;X;;;;;;;;;;210;;0;20150907175900;AA45458325;;;;;NEXE;NEXE;;;;;20150907180500;;;;245346;;;;;;;;;;;;;;;;;;;;;;;;;;;;HOP;;;;;;;;;;;0
What my code does is from out of all the airports, rank the top ten based on the total number of movements that an airport has, that's calculated from the total number of landings and takeoffs that an airport has during the period of time specified in the file.
The code works and I'm tryng to improve it using better libraries or functions. So far I swap the read.table() function for read_delim() which has improved the processing time quite drastically. Nevertheless, I get some warnings althought printing the desired results.
How do I get rid of the warnings or how can I sort out the warnings?
This is the code and here a link to a test file https://easyupload.io/4lw4o4:
start_time <- Sys.time()
# Libraries
library(compare)
library(janitor)
library(data.table)
library(readr)[enter image description here][1]
# START
args = commandArgs(trailingOnly=TRUE)
# test if there is at least one argument
if (length(args)==0) {
fileName = "traffic1week.exp2"
} else if (length(args)==1) {
# default output file
fileName = args[1]
}
# Convert file to dataframe
df = read_delim(fileName, delim = ";", col_names =F)
# Read file
names(df) = c("Airport","Airport")
# Retrieve 1st column
origin = df[1]# takeOff Airports
# Retrieve 2nd column
destination = df[2] # Landing Airports
# Number of movements
takeOff_airports = unlist(table(origin))
landing_airports = unlist(table(destination))
# Convert to dataframes
df1 = as.data.frame(takeOff_airports)
names(df1) = c('Airport', 'TakeOffs')
df2 = as.data.frame(landing_airports)
names(df2) = c('Airport', 'Landings')
# Merge both dataframes
df3 = merge(df1, df2, all=T)
# Sum colum[3] values from each dataframe
df3$Total_Movements = df3$TakeOffs+df3$Landings
# Orde by max total movements
df3 = df3[order(-df3$Total_Movements),]
# Reorganize columns
result = df3[, c(1, 4, 2, 3)]
# Print results
print(result[1:10,], row.names = FALSE)
# STOP
end_time = Sys.time()
cat(paste("Processing time: ", end_time - start_time),sep="\n\n")
This are the warnings:
Attaching package: ‘compare’
The following object is masked from ‘package:base’:
isTRUE
Attaching package: ‘janitor’
The following objects are masked from ‘package:stats’:
chisq.test, fisher.test
Parsed with column specification:
cols(
.default = col_logical(),
X1 = col_character(),
X2 = col_character(),
X3 = col_character(),
X4 = col_character(),
X5 = col_character(),
X6 = col_character(),
X7 = col_character(),
X8 = col_double(),
X9 = col_double(),
X10 = col_character(),
X11 = col_character(),
X12 = col_character(),
X13 = col_character(),
X14 = col_double(),
X15 = col_character(),
X16 = col_character(),
X17 = col_double(),
X20 = col_double(),
X23 = col_double(),
X24 = col_character()
# ... with 13 more columns
)
See spec(...) for full column specifications.
Warning message:
The `names` must have length 95, not 2.
This warning is displayed once per session.
Airport Total_Movements TakeOffs Landings
LFPG 9407 4926 4481
EHAM 9399 4879 4520
LTBA 9384 4749 4635
EGLL 9057 4749 4308
EDDF 8930 4624 4306
EDDM 7535 3816 3719
LEMD 7412 3789 3623
LIRF 6957 3528 3429
LEBL 6406 3221 3185
EGKK 5995 3050 2945
Processing time: 1.78606390953064
I'd like to get just:
Airport Total_Movements TakeOffs Landings
LFPG 9407 4926 4481
EHAM 9399 4879 4520
LTBA 9384 4749 4635
EGLL 9057 4749 4308
EDDF 8930 4624 4306
EDDM 7535 3816 3719
LEMD 7412 3789 3623
LIRF 6957 3528 3429
LEBL 6406 3221 3185
EGKK 5995 3050 2945
Processing time: 1.78606390953064
Warning give by packages when loaded are normal. They are just informative. There's a way to suppress messages, by using the suppressMessages()
, but if you suppress every message at all in your program, you might risk missing some important message in the case of an exception.
Try this
suppressMessages(library(compare))
As for the warnings given by read_delim()
, they are just informing you of the class types asssumed for each column, since you didn't specify them yourself. If you pass the colClasses
parameter to read_delim()
, it will stop babbling. You may also suppress these messages by placing read_delim()
inside suppressMessage()
or suppressWarnings()
, as mentioned above. Or if you use read.table()
instead, it quietly assumes reasonable types for the columns without emitting messages about it.