R - From Factor to Numeric or Integer error
The root of the problem is likely some funky value in your imported csv. If it came from excel, this is not uncommon. It can be a percent symbol, a "comment" character from excel or any of a long list of things. I would look at the csv in your editor of choice and see what you can see.
Aside from that, you have a few options.
read.csv
takes an optional argument stringsAsFactors
which you can set to FALSE
A factor is stored as integer levels which map to values. When you convert directly with as.numeric
you wind up with those integer levels rather than the initial values:
> x<-10:20
> as.numeric(factor(x))
[1] 1 2 3 4 5 6 7 8 9 10 11
>
otherwise look at ?factor
:
In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor
f
to approximately its original numeric values,as.numeric(levels(f))[f]
is recommended and slightly more efficient thanas.numeric(as.character(f))
.
However, I suspect this will error because the input has something in it besides a number.
@Justin is correct. Here's a walk-through on how to find the offending values:
# A sample data set with a weird value ("4%") in it
d <- read.table(text="A B\n1 2\n3 4%\n", header=TRUE)
str(d)
#'data.frame': 2 obs. of 2 variables:
# $ A: int 1 3
# $ B: Factor w/ 2 levels "2","4%": 1 2
as.numeric(d$B) # WRONG, returns 1 2 (the internal factor codes)
# This correctly converts to numeric
x <- as.numeric(levels(d$B))[d$B] # 2 NA
# ...and this finds the offending value(s):
d$B[is.na(x)] # 4%
# and this finds the offending row numbers:
which(is.na(x)) # row 2
Note that if your data set has missing values encoded as something other than an empty cell or the string "NA", you have to specify that to read.table:
# Here "N/A" is used instead of "NA"...
read.table(text="A B\n1 2\n3 N/A\n", header=TRUE, na.strings="N/A")