Read arbitrary lines from file

No, you can't read arbitrary lines, but you can't do that in most other programming languages either. You have to read line after line from the beginning. Apart from the \read primitive, there is also the \readline primitive which reads a line verbatim. You can use that to save all lines in macros and use these afterward:

\newread\myread
\newcount\linecnt
\openin\myread=file.dat
\@whilesw\unless\ifeof\myread\fi{%
  \advance\linecnt by \@ne
  \readline\myread t\expandafter o\csname line-\number\linecnt\endcsname
}

TeX is able to read lines from a file as long as they have balanced braces. This is accomplished through \openin and \read.

\newread\myread
\openin\myread=file.dat
\read\myread to \command

Now \command will expand to the contents of the first line in file.dat and a subsequent \read\myread to \command will do the same with the following line. The end of the file can be tested with \ifeof\myread, but one should always keep in mind that TeX appends an empty line to each file. The symbolic name \myread can be reused after having closed the file with \closein\myread.

It must be mentioned the possibility of calling, via "shell-escape", external utilities that may be faster and more flexible.


Update

After nine years it's time to do an update. Here's a set of macros for outputting selected lines from a file with some tokens between each line. An application is given for a table.

The first argument is the file name, the second one is a comma separated list of ranges. Start and end of the range are separated by /; if an index is negative, it means starting from the end. Take into account that the range is always set “going up”. So if you specify 3/1 you get lines from 1 to 3 in this order.

However, if \readfromfile* is called, the lines are reversed. So \readfromfile*{1/3} will print the lines from 3 to 1 in this order. The reversal is done at the last moment, so be sure to specify the ranges also in reverse order.

A trailing optional argument specifies what to insert between lines, default nothing.

\begin{filecontents*}{\jobname.dat}
line 1
line 2
line 3
line 4
line 5
line 6
\end{filecontents*}
\begin{filecontents*}{\jobname.tab}
line 1 & line 1
line 2 & line 2
line 3 & line 3
line 4 & line 4
line 5 & line 5
line 6 & line 6
\end{filecontents*}


\documentclass[twocolumn]{article}
\usepackage{xparse}

\ExplSyntaxOn

\NewDocumentCommand{\readfromfile}{s m m +O{}}
 {% #1 = * if reversing
  % #2 = file name, #3 = list of lines, #4 = what to use between items
  \IfBooleanTF { #1 }
   { \bool_set_true:N \l__malmedal_rff_reverse_bool }
   { \bool_set_false:N \l__malmedal_rff_reverse_bool }
  \malmedal_rff_main:nnn { #2 } { #3 } { #4 }
 }

\bool_new:N \l__malmedal_rff_reverse_bool
\ior_new:N \g__malmedal_rff_file_ior
\seq_new:N \l__malmedal_rff_items_seq
\seq_new:N \l__malmedal_rff_items_out_seq
\clist_new:N \l__malmedal_rff_lines_clist

\prg_generate_conditional_variant:Nnn \tl_if_empty:n { e } { p,T,F,TF }

\cs_new_protected:Nn \malmedal_rff_main:nnn
 {
  % clear the variables
  \seq_clear:N \l__malmedal_rff_items_seq
  \seq_clear:N \l__malmedal_rff_items_out_seq
  \clist_clear:N \l__malmedal_rff_lines_clist
  % read the file line by line and stores them in a sequence
  \ior_open:Nn \g__malmedal_rff_file_ior { #1 }
  \ior_map_inline:Nn \g__malmedal_rff_file_ior
   {
    \seq_put_right:Nx \l__malmedal_rff_items_seq { \tl_trim_spaces:n { ##1 } }
   }
  \ior_close:N \g__malmedal_rff_file_ior
  % output the items we want
  \__malmedal_rff_output:nn { #2 } { #3 }
 }

\cs_new_protected:Nn \__malmedal_rff_output:nn
 {
  \clist_map_function:nN { #1 } \__malmedal_rff_additems:n
  \clist_map_inline:Nn \l__malmedal_rff_lines_clist
   {
    \seq_put_right:Nx \l__malmedal_rff_items_out_seq { \seq_item:Nn \l__malmedal_rff_items_seq { ##1 } }
   }
  \bool_if:NT \l__malmedal_rff_reverse_bool
   {
    \seq_reverse:N \l__malmedal_rff_items_out_seq
   }
  \seq_use:Nn \l__malmedal_rff_items_out_seq { #2 }
 }

\cs_new_protected:Nn \__malmedal_rff_additems:n
 {
  \seq_set_split:Nnn \l_tmpa_seq { / } { #1 }
  \int_compare:nTF { \seq_count:N \l_tmpa_seq = 1 }
   {
    \clist_put_right:Nn \l__malmedal_rff_lines_clist { #1 }
   }
   {
    \int_step_inline:nnn
     {
      \tl_if_empty:eTF { \seq_item:Nn \l_tmpa_seq { 1 } }
       { 1 }
       {
        \int_compare:nTF { \seq_item:Nn \l_tmpa_seq { 1 } < 0 }
         {
          \seq_count:N \l__malmedal_rff_items_seq + (\seq_item:Nn \l_tmpa_seq { 1 }) + 1
         }
         { \seq_item:Nn \l_tmpa_seq { 1 } }
       }
     }
     {
      \tl_if_empty:eTF { \seq_item:Nn \l_tmpa_seq { 2 } }
       { \seq_count:N \l__malmedal_rff_items_seq } % all the way up
       {
        \int_compare:nTF { \seq_item:Nn \l_tmpa_seq { 2 } < 0 }
         {
          \seq_count:N \l__malmedal_rff_items_seq + (\seq_item:Nn \l_tmpa_seq { 2 }) + 1
         }
         { \seq_item:Nn \l_tmpa_seq { 2 } }
       }
     }
     { \clist_put_right:Nn \l__malmedal_rff_lines_clist { ##1 } }
   }
 }

\ExplSyntaxOff

\begin{document}

All lines\par
\readfromfile{\jobname.dat}{/}[\par]

All lines\par
\readfromfile{\jobname.dat}{1/}[\par]

All lines\par
\readfromfile{\jobname.dat}{1/-1}[\par]

\bigskip
Lines 1 and 4 to end\par
\readfromfile{\jobname.dat}{1,4/}[\par]

\bigskip
Lines from 1 to penultimate\par
\readfromfile{\jobname.dat}{1/-2}[\par]

\bigskip
Lines from antepenultimate to last\par
\readfromfile{\jobname.dat}{-3/-1}[\par]

\newpage

Reverse\par
\readfromfile*{\jobname.dat}{/}[\par]

\bigskip
Last and penultimate lines in reverse\par
\readfromfile*{\jobname.dat}{1/2}[\par]

\bigskip
Table with select lines\par
\begin{tabular}{ll}
A & B \\
\hline
\readfromfile{\jobname.tab}{2/4}[\\]
\end{tabular}

\bigskip
Table with all lines in reverse\par
\begin{tabular}{ll}
A & B \\
\hline
\readfromfile*{\jobname.tab}{/}[\\]
\end{tabular}

\end{document}

enter image description here


What you are asking about here is "random access" to the lines of text file, which is not supported by any filesystem that I know of. Most programming languages have a seek function that can "jump" forward and backwards in a file by a given number of bytes. This works well in binary files or text files where every line is the same length since the byte offset of "jump to position x" can be calculated.

Unfortunately, most text files have a varying number of bytes per line so it is impossible to predict the offset at which a given line begins without reading the contents of the file. To get around this problem, you need to "index" the file in some way such that lines can be located in an arbitrary order. Two methods that I am aware of are:

  • Read the contents of the file, split it into lines and store the results in an array. The array index then provides random access to the file contents. This works well for small files.

  • For large files, you have to worry about running out of time or memory. A common trick to conserve memory is to scan the file and store the byte offset of each line in an array rather that the contents of the line its self. These offsets can then be fed to seek in order to locate a line in the file. Perl's Tie::File module uses this method along with some other tricks to represent large files as arrays.


Suppose we have the following input file:

foo
bar
baz
bim

With TeX you have two problems:

  1. You can only read the file one line at a time---no fancy tricks such as a seek method that lets you jump over content.

  2. Without the help of packages, no nice object that can be used as an array with operations like indexed lookup or maximum length.

Solution: Use LuaTeX

This is the easiest way to go since Lua tables make great array objects:

\documentclass{minimal}

\def\readfile#1#2{%
  % Read the contents of a file #1 to an array and store the array name in #2
  \directlua{%
    local input = io.open('#1', 'r')
    #2 = {}
    for line in input:lines() do
      table.insert(#2, line)
    end
    input:close()
  }%
}

\begin{document}   
% Read the contents of io.in to foo
\readfile{io.in}{foo}

% Loop over foo any way you want. The Lua shortcut for table.getn(foo) is #foo,
% but the # character is also special to TeX which causes some confusion.
\directlua{%
  texio.write_nl("Normal:")%
  for i = 1, table.getn(foo) do texio.write_nl(foo[i]) end}

\directlua{%
  texio.write_nl("Reversed:")%
  for i = table.getn(foo), 1, -1 do texio.write_nl(foo[i]) end}

\directlua{%
  texio.write_nl("Odd entries only:")%
  for i = 1, table.getn(foo), 2 do texio.write_nl(foo[i]) end}

\end{document}

The lualatex log output is:

Normal:
foo
bar
baz
bim
Reversed:
bim
baz
bar
foo
Odd entries only:
foo
baz

Lua FILE objects such as input also have a seek method that can be used to pull off advanced tricks.