Is there a way in R to ignore a "." in my data when calculating mean/sd/etc

Gregory Smith :

I have a large data set that I need to calculate mean/std dev/min/ and max on for several columns. The data set uses a "." to denote when a value is missing for a subject. When running the mean or sd function this causes R to return NA . Is there a simple way around this?

my code is just this

xCAL<-mean(longdata$CAL)
sdCAL<-sd(longdata$CAL)
minCAL<-min(longdata$CAL)
maxCAL<-max(longdata$CAL)

but R will return NA on all these variables. I get the following Error

Warning message: In mean.default(longdata$CAL) : argument is not numeric or logical: returning NA

Gregor Thomas :

You need to convert your data to numeric to be able to do any calculations on it. When you run as.numeric, your . will be converted to NA, which is what R uses for missing values. Then, all of the function you mention take an argument na.rm that can be set to TRUE to remove (rm) missing values (na).

If your data is a factor, you need to convert it to character first to avoid loss of information as explained in this FAQ.

Overall, to be safe, try this:

longdata$CAL <- as.numeric(as.character(longdata$CAL))
xCAL <- mean(longdata$CAL, na.rm = TRUE)
sdCAL <- sd(longdata$CAL, na.rm = TRUE)
# etc

Do note that na.rm is a property of the function - it's not magic that works everywhere. If you look at the help pages for ?mean ?sd, ?min, etc., you'll see the na.rm argument documented. If you want to remove missing values in general, the na.omit() function works well.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=376401&siteId=1