In R how do I read a CSV file line by line and have the contents recognised as the correct data type?
Based on DWin's comment, you can try something like this:
read.clump <- function(file, lines, clump){
if(clump > 1){
header <- read.csv(file, nrows=1, header=FALSE)
p = read.csv(file, skip = lines*(clump-1),
#p = read.csv(file, skip = (lines*(clump-1))+1 if not a textConnection
nrows = lines, header=FALSE)
names(p) = header
} else {
p = read.csv(file, skip = lines*(clump-1), nrows = lines)
}
return(p)
}
You should probably add some error handling/checking to the function, too.
Then with
x = "letter1, letter2
a, b
c, d
e, f
g, h
i, j
k, l"
>read.clump(textConnection(x), lines = 2, clump = 1)
letter1 letter2
1 a b
2 c d
> read.clump(textConnection(x), lines = 2, clump = 2)
letter1 letter2
1 e f
2 g h
> read.clump(textConnection(x), lines = 3, clump = 1)
letter1 letter2
1 a b
2 c d
3 e f
> read.clump(textConnection(x), lines = 3, clump = 2)
letter1 letter2
1 g h
2 i j
3 k l
Now you just have to *apply over clumps
An alternate strategy that has been discussed here before to deal with very big (say, > 1e7ish cells) CSV files is:
- Read the CSV file into an SQLite database.
- Import the data from the database with
read.csv.sql
from thesqldf
package.
The main advantages of this are that it is usually quicker and you can easily filter the contents to only include the columns or rows that you need.
See how to import CSV into sqlite using RSqlite? for more info.
Just for fun (I'm waiting on a long running computation here :-) ), a version that allows you to use any of the read.*
kind of functions, and that holds a solution to a tiny error in \Greg's code:
read.clump <- function(file, lines, clump, readFunc=read.csv,
skip=(lines*(clump-1))+ifelse((header) & (clump>1) & (!inherits(file, "connection")),1,0),
nrows=lines,header=TRUE,...){
if(clump > 1){
colnms<-NULL
if(header)
{
colnms<-unlist(readFunc(file, nrows=1, header=FALSE))
print(colnms)
}
p = readFunc(file, skip = skip,
nrows = nrows, header=FALSE,...)
if(! is.null(colnms))
{
colnames(p) = colnms
}
} else {
p = readFunc(file, skip = skip, nrows = nrows, header=header)
}
return(p)
}
Now you can pass the relevant function as parameter readFunc, and pass extra parameters too. Meta programming is fun.