how to skip reading certain columns in readr
There is an answer out there, I just didn't search hard enough: https://github.com/hadley/readr/issues/132
Apparently this was a documentation issue that has been corrected. This functionality may eventually get added but Hadley thought it was more useful to be able to just update one column type and not drop the others.
Update: The functionality has been added
The following code is from the readr documentation:
read_csv("iris.csv", col_types = cols_only( Species = col_factor(c("setosa", "versicolor", "virginica"))))
This will read only the Species column of the iris data set. In order to read only a specific column you must also pass the column specification i.e. col_factor
, col_double
, etc...
"According to the read_csv documentation, one way to accomplish this is to pass a named list for col_types and only name the columns you want to keep"
WRONG: read_csv('test.csv', col_types=list(colB='c', colC='c'))
No, the doc is misleading, you have to either specify that unnamed cols get dropped (class='_'
/col_skip()
), or else explicitly specify their class as NULL:
read_csv('test.csv', col_types=list('*'='_', colB='c', colC='c'))
read_csv('test.csv', col_types=list('colA'='_', colB='c', colC='c'))