Linux shell command to filter a text file by line length

Solution 1:

awk '{ if (length($0) < 16384) print }' yourfile >your_output_file.txt

would print lines shorter than 16 kilobytes, as in your own example.

Or if you fancy Perl:

perl -nle 'if (length($_) < 16384) { print }' yourfile >your_output_file.txt

Solution 2:

This is similar to Ansgar's answer, but slightly faster in my tests:

awk 'length($0) < 16384' infile >outfile

It's the same speed as the other awk answers. It relies on the implicit print of a true expression, but doesn't need to take the time to split the line as Ansgar's does.

Note that AWK gives you an if for free. The command above is equivalent to:

awk 'length($0) < 16384 {print}' infile >outfile

There's no explicit if (or its surrounding set of curly braces) as in some of the other answers.

Here is a way to do it in sed:

sed '/.\{16384\}/d' infile >outfile

or:

sed -r '/.{16384}/d' infile >outfile

which delete any line that contains 16384 (or more) characters.

For completeness, here's how you'd use sed to save lines longer than your threshold:

sed '/^.\{0,16383\}$/d' infile >outfile

Solution 3:

Not really different from the answers already given, but shorter still:

awk -F '' 'NF < 16384' infile >outfile

Solution 4:

You can awk such as:

$ awk '{ if (length($0) < 16384) { print } }' /path/to/text/file

This will print the lines longer shorter than 16K characters (16 * 1024).

You can use grep also:

$ grep ".\{,16384\}" /path/to/text/file

This will print the lines at most 16K characters.