In R how do I read a CSV file line by line and have the contents recognised as the correct data type?

Based on DWin's comment, you can try something like this:

read.clump <- function(file, lines, clump){
    if(clump > 1){
        header <- read.csv(file, nrows=1, header=FALSE)
        p = read.csv(file, skip = lines*(clump-1), 
       #p = read.csv(file, skip = (lines*(clump-1))+1 if not a textConnection           
            nrows = lines, header=FALSE)

        names(p) = header
    } else {
        p = read.csv(file, skip = lines*(clump-1), nrows = lines)
    }
    return(p)
}

You should probably add some error handling/checking to the function, too.

Then with

x = "letter1, letter2
a, b
c, d
e, f
g, h
i, j
k, l"


>read.clump(textConnection(x), lines = 2, clump = 1)
  letter1 letter2
1       a       b
2       c       d

> read.clump(textConnection(x), lines = 2, clump = 2)
  letter1  letter2
1       e        f
2       g        h

> read.clump(textConnection(x), lines = 3, clump = 1)
  letter1 letter2
1       a       b
2       c       d
3       e       f


> read.clump(textConnection(x), lines = 3, clump = 2)
  letter1  letter2
1       g        h
2       i        j
3       k        l

Now you just have to *apply over clumps


An alternate strategy that has been discussed here before to deal with very big (say, > 1e7ish cells) CSV files is:

  1. Read the CSV file into an SQLite database.
  2. Import the data from the database with read.csv.sql from the sqldf package.

The main advantages of this are that it is usually quicker and you can easily filter the contents to only include the columns or rows that you need.

See how to import CSV into sqlite using RSqlite? for more info.


Just for fun (I'm waiting on a long running computation here :-) ), a version that allows you to use any of the read.* kind of functions, and that holds a solution to a tiny error in \Greg's code:

read.clump <- function(file, lines, clump, readFunc=read.csv,
    skip=(lines*(clump-1))+ifelse((header) & (clump>1) & (!inherits(file, "connection")),1,0),
    nrows=lines,header=TRUE,...){
    if(clump > 1){
            colnms<-NULL
            if(header)
            {
                colnms<-unlist(readFunc(file, nrows=1, header=FALSE))
                print(colnms)
            }
      p = readFunc(file, skip = skip,
          nrows = nrows, header=FALSE,...)
            if(! is.null(colnms))
            {
        colnames(p) = colnms
            }
    } else {
        p = readFunc(file, skip = skip, nrows = nrows, header=header)
    }
    return(p)
}

Now you can pass the relevant function as parameter readFunc, and pass extra parameters too. Meta programming is fun.

Tags:

Csv

R