Dealing with TRUE, FALSE, NA and NaN
You don't need to wrap anything in a function - the following works
a = c(T,F,NA)
a %in% TRUE
[1] TRUE FALSE FALSE
Taking Ben Bolker's suggestion above you could set your own function following the is.na() syntax
is.true <- function(x) {
!is.na(x) & x
}
a = c(T,F,F,NA,F,T,NA,F,T)
is.true(a)
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
This also works for subsetting data.
b = c(1:9)
df <- as.data.frame(cbind(a,b))
df[is.true(df$a),]
a b
1 1 1
6 1 6
9 1 9
And helps avoid accidentally incorporating empty rows where NA do exist in the data.
df[df$a == TRUE,]
a b
1 1 1
NA NA NA
6 1 6
NA.1 NA NA
9 1 9
So you want TRUE to remain TRUE and FALSE to remain FALSE, the only real change is that NA needs to become FALSE, so just do this change like:
a[ is.na(a) ] <- FALSE
Or you could rephrase to say it is only TRUE if it is TRUE and not missing:
a <- a & !is.na(a)
To answer your questions in order:
1) The ==
operator does indeed not treat NA's as you would expect it to. A very useful function is this compareNA
function from r-cookbook.com:
compareNA <- function(v1,v2) {
# This function returns TRUE wherever elements are the same, including NA's,
# and false everywhere else.
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
return(same)
}
2) NA stands for "Not available", and is not the same as the general NaN ("not a number"). NA is generally used for a default value for a number to stand in for missing data; NaN's are normally generated because a numerical issue (taking log of -1 or similar).
3) I'm not really sure what you mean by "logical things"--many different data types, including numeric vectors, can be used as input to logical operators. You might want to try reading the R logical operators page: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Logic.html.
Hope this helps!