Most elegant way to load csv with point as thousands separator in R
Adapted from this post: Specify custom Date format for colClasses argument in read.table/read.csv
#some sample data
write.csv(data.frame(a=c("1.234,56", "1.234,56"),
b=c("1.234,56", "1.234,56")),
"test.csv", row.names=FALSE, quote=TRUE)
#define your own numeric class
setClass('myNum')
#define conversion
setAs("character", "myNum",
function(from) as.numeric(gsub(",", "\\.", gsub("\\.", "", from))))
#read data with custom colClasses
read_data = read.csv("test.csv",
stringsAsFactors=FALSE,
colClasses=c("myNum", "myNum"))
#let's try whether this is really a numeric
read_data[1, 1] * 2
#[1] 2469.12
Rather than try to fix it all at loading time, I would load the data into R as a string, then process it to numeric.
So after loading, it's a column of strings like "4.123,98"
Then do something like:
number.string <- gsub("\\.", "", number.string)
number.string <- gsub(",", "\\.", number.string)
number <- as.numeric(number.string)