How do I concatenate all the files in a given directory in order of date, where I want the newest file on top?
To concatenate files you use
cat file1 file2 file3 ...
To get a list of quoted filenames sorted by time, newest first, you use
ls -t
Putting it all together,
cat $(ls -t) > outputfile
You might want to give some arguments to ls
(eg, *.html
).
But if you have filenames with spaces in them, this will not work. My file.html
will be assumed to be two filenames: My
and file.html
. You can make ls
quote the filenames, and then use xargs
, who understands the quoting, to pass the arguments to cat
.
ls -tQ | xargs cat
As for your second question, filtering out parts of files isn't difficult, but it depends on what exactly you want to strip out. What are the “redundant headers”?
The easiest way of listing files in an order other than lexicographic is with zsh glob qualifiers. Without zsh, you can use ls
, but parsing the output of ls
is fraught with dangers.
cat *(om)
If you want to strip some lines, use sed or awk or perl. For example, to take the <head>
from the first file and combine the <body>
parts from the other files, assuming that the <body>
and </body>
tags are alone on a line in every file:
{
sed -e '/<\/body>/ q' *.html(om[2])
sed -e '1,/<body>/ d' -e '/<\/body>/,$ d' *.html(om[3,-1])
echo '</body>'
echo '</html>'
} >concatenated.html
Explanation:
- First,
concatenated.html
is created. It is therefore the youngest*.html
file (assuming no file has a date in the future. - Then copy from the second-youngest
*.html
file, but quit at the</body>
line. - Then copy from the other files, but skip everything down to the
<body>
line and starting with the</body>
line. - Finally produce the last closing tags.