How do I remove blank lines from text in PHP?
// New line is required to split non-blank lines
preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
The above regular expression says:
/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/
1st Capturing group (^[\r\n]*|[\r\n]+)
1st Alternative: ^[\r\n]*
^ assert position at start of the string
[\r\n]* match a single character present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
2nd Alternative: [\r\n]+
[\r\n]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
[\s\t]* match a single character present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\tTab (ASCII 9)
[\r\n]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
Your ereg-replace()
solution is wrong because the ereg/eregi
methods are deprecated. Your preg_replace()
won't even compile, but if you add delimiters and set multiline mode, it will work fine:
$str = preg_replace('/^[ \t]*[\r\n]+/m', '', $str);
The m
modifier allows ^
to match the beginning of a logical line rather than just the beginning of the whole string. The start-of-line anchor is necessary because without it the regex would match the newline at the end of every line, not just the blank ones. You don't need the end-of-line anchor ($
) because you're actively matching the newline characters, but it doesn't hurt.
The accepted answer gets the job done, but it's more complicated than it needs to be. The regex has to match either the beginning of the string (^[\r\n]*
, multiline mode not set) or at least one newline ([\r\n]+
), followed by at least one newline ([\r\n]+
). So, in the special case of a string that starts with one or more blank lines, they'll be replaced with one blank line. I'm pretty sure that's not the desired outcome.
But most of the time it replaces two or more consecutive newlines, along with any horizontal whitespace (spaces or tabs) that lies between them, with one linefeed. That's the intent, anyway. The author seems to expect \s
to match just the space character (\x20
), when in fact it matches any whitespace character. That's a very common mistake. The actual list varies from one regex flavor to the next, but at minimum you can expect \s
to match whatever [ \t\f\r\n]
matches.
Actually, in PHP you have a better option:
$str = preg_replace('/^\h*\v+/m', '', $str);
\h
matches any horizontal whitespace character, and \v
matches vertical whitespace.
The comment from Bythos from Jamie's link above worked for me:
/^\n+|^[\t\s]*\n+/m
I didn't want to strip all of the new lines, just the empty/whitespace ones. This does the trick!
Just explode the lines of the text to an array, remove empty lines using array_filter
and implode the array again.
$tmp = explode("\n", $str);
$tmp = array_filter($tmp);
$str = implode("\n", $tmp);
Or in one line:
$str = implode("\n", array_filter(explode("\n", $str)));
I don't know, but this is maybe faster than preg_replace
.