How can I fix lines broken in wrong places?
With awk
:
awk -v ORS= '{print (NR == 1 ? "" : /^[[:lower:]]/ ? " " : RS) $0}
END {if (NR) print RS}'
That is, do not append the record separator to each line (ORS empty). But prepend a record separator before the current line if not on the first line and the current line doesn't start with a lowercase letter. Otherwise prepend a space character instead, except on the first line.
try
awk '$NF !~ /\.$/ { printf "%s ",$0 ; next ; } {print;}' file
where
$NF !~ /\.$/
match line where last element do not end with a dot,{ printf "%s ",$0
print this line with a trailling space, and no line feed,next ; }
fetch next line,{print;}
and print it.
I am sure there will be a sed
option.
Note: this will work with line ending in a dot, however condition in sentences beginning with upper case letter won't get merged. See Stéphane Chazelas's answer.
In perl:
#!/usr/bin/perl -w
use strict;
my $input = join("", <>);
$input =~ s/\n([a-z])/ $1/g;
print $input;
Technically you wanted to replace "newline followed by lower-case letter" with "space and-that-lower-case-letter", which is what the core of the above perl script does:
- Read in the input to a string
input
. - Update the
input
variable to be the result of the search & replace operation. - Print the new value.