Import fixed width data file with no line separator

Maybe not the best idea but this should work:

content <- scan('filepath','character',sep='~') # Warning choose a sep not appearing in datas to get the whole file.
# Split content in lines:
lines <- regmatches(content,gregexpr('.{60}',content))[[1]]
x <- tempfile()
write(lines,x)
data <- read.fwf(x, widths = c(8,4,7,41))
unlink(x)

The idea is to read the whole file, get each occurence of 60 chars into a single entry, write this to a tempfile, and read the data from this tempfile before deleting the temporary file.

Another approach is doable with regexes and package stringr (still with content resulting from scan above):

library(stringr)
d <- data.frame( str_match_all( content, "(.{8})(.{4})(.{7})(.{41})")[[1]][,2:5], stringsAsFactors=FALSE)

which gives:

        V1   V2      V3                                        V4
1 20141101  77h   3.210                                   0    3 
2 20141102  76h   3.090                                   0    3

str_match_all return a list, here with 1 element because there's only one line as input, so we remove it with [[1]].

Now the return is 5 columns, the first one being the full match, others being the capture groups so we subset the matrix on columns 2 to 5 to get only the 4 columns we need and wrap it in as.data.frame to get a data.frame at end.

you can then name the columns with colnames(d) <- c('date','time','data_point','rest')

If you wish to clean up the white spaces you can wrap the str_extract_all result in trimws (thanks to @jaap for the remind of this function) like this:

td <- data.frame( trimws( str_match_all( content, "(.{8})(.{4})(.{7})(.{41})")[[1]][,2:5] ), stringsAsFactors=FALSE)

Output:

        X1  X2    X3     X4
1 20141101 77h 3.210 0    3
2 20141102 76h 3.090 0    3

A different, and probably less elegant, solution with readLines, substr, trimws, separate (tidyr) and mutate_all (dplyr):

txt <- readLines('filepath')
dfx <- data.frame(V1 = sapply(seq(from=1, to=nchar(txt), by=60),
                              function(x) substr(txt, x, x+59)))

library(dplyr)
library(tidyr)
dfx %>% 
  separate(V1, c(paste0("V",LETTERS[1:5])), c(8,12,19,55)) %>% 
  mutate_all(trimws)

which gives:

        VA  VB    VC VD VE
1 20141101 77h 3.210  0  3
2 20141102 76h 3.090  0  3

To get different column names , just replace c(paste0("V",LETTERS[1:5]) with a vector of columnnames you want.

If you want to transform the columns into the correct classes instead of into character, you can use funs(ul = type.convert(trimws(.))) inside mutate_all.

Import fixed width data file with no line separator

Tags:

Import

R

Dbf

Related

Recent Posts