How to delete line if longer than XY?
sed '/^.\{2048\}./d' input.txt > output.txt
Here's a solution which deletes lines that has 2049 or more characters:
sed '/.\{2049\}/d' <file.in >file.out
The regular expression .\{2049\}
would match any line that contains a substring of 2049 characters (another way of saying "at least 2049 characters"). The d
command deletes them from the input, producing only shorter line on the output.
BSD sed
(on e.g. macOS) can only handle repetition counts of up to 256 in the \{...\}
operator (the value of RE_DUP_MAX
; see getconf RE_DUP_MAX
in the shell). On these systems, you may instead use awk
:
awk 'length <= 2048' <file.in >file.out
Mimicking the sed
solution literally with awk
:
awk 'length >= 2049 { next } { print }' <file.in >file.out
Note that any awk
implementation is only guaranteed to be able to handle records of lengths up to LINE_MAX
bytes (see getconf LINE_MAX
in the shell), but may support longer ones. On macOS, LINE_MAX
is 2048.
Something like this should work in Python.
of = open("orig")
nf = open("new",'w')
for line in of:
if len(line) < 2048:
nf.write(line)
of.close()
nf.close()