Read a text file in R line by line
You should take care with readLines(...)
and big files. Reading all lines at memory can be risky. Below is a example of how to read file and process just one line at time:
processFile = function(filepath) {
con = file(filepath, "r")
while ( TRUE ) {
line = readLines(con, n = 1)
if ( length(line) == 0 ) {
break
}
print(line)
}
close(con)
}
Understand the risk of reading a line at memory too. Big files without line breaks can fill your memory too.
Just use readLines
on your file:
R> res <- readLines(system.file("DESCRIPTION", package="MASS"))
R> length(res)
[1] 27
R> res
[1] "Package: MASS"
[2] "Priority: recommended"
[3] "Version: 7.3-18"
[4] "Date: 2012-05-28"
[5] "Revision: $Rev: 3167 $"
[6] "Depends: R (>= 2.14.0), grDevices, graphics, stats, utils"
[7] "Suggests: lattice, nlme, nnet, survival"
[8] "Authors@R: c(person(\"Brian\", \"Ripley\", role = c(\"aut\", \"cre\", \"cph\"),"
[9] " email = \"[email protected]\"), person(\"Kurt\", \"Hornik\", role"
[10] " = \"trl\", comment = \"partial port ca 1998\"), person(\"Albrecht\","
[11] " \"Gebhardt\", role = \"trl\", comment = \"partial port ca 1998\"),"
[12] " person(\"David\", \"Firth\", role = \"ctb\"))"
[13] "Description: Functions and datasets to support Venables and Ripley,"
[14] " 'Modern Applied Statistics with S' (4th edition, 2002)."
[15] "Title: Support Functions and Datasets for Venables and Ripley's MASS"
[16] "License: GPL-2 | GPL-3"
[17] "URL: http://www.stats.ox.ac.uk/pub/MASS4/"
[18] "LazyData: yes"
[19] "Packaged: 2012-05-28 08:47:38 UTC; ripley"
[20] "Author: Brian Ripley [aut, cre, cph], Kurt Hornik [trl] (partial port"
[21] " ca 1998), Albrecht Gebhardt [trl] (partial port ca 1998), David"
[22] " Firth [ctb]"
[23] "Maintainer: Brian Ripley <[email protected]>"
[24] "Repository: CRAN"
[25] "Date/Publication: 2012-05-28 08:53:03"
[26] "Built: R 2.15.1; x86_64-pc-mingw32; 2012-06-22 14:16:09 UTC; windows"
[27] "Archs: i386, x64"
R>
There is an entire manual devoted to this.
Here is the solution with a for
loop. Importantly, it takes the one call to readLines
out of the for loop so that it is not improperly called again and again. Here it is:
fileName <- "up_down.txt"
conn <- file(fileName,open="r")
linn <-readLines(conn)
for (i in 1:length(linn)){
print(linn[i])
}
close(conn)