How to truncate a string in PHP to the sentence closest to a certain number of characters?

I tried several functions and regular expressions, but none of them works as I would have liked, so I create this one:

function sentenceTrim($string, $maxLength = 300) {
    $string = preg_replace('/\s+/', ' ', trim($string)); // Replace new lines (optional)

    if (mb_strlen($string) >= $maxLength) {
        $string = mb_substr($string, 0, $maxLength);

        $puncs  = array('. ', '! ', '? '); // Possible endings of sentence
        $maxPos = 0;

        foreach ($puncs as $punc) {
            $pos = mb_strrpos($string, $punc);

            if ($pos && $pos > $maxPos) {
                $maxPos = $pos;
            }
        }

        if ($maxPos) {
            return mb_substr($string, 0, $maxPos + 1);
        }

        return rtrim($string) . '…';
    } else {
        return $string;
    }           
}

It trims string to specified max length, finds the last occurrence of the end (. or ! or ?) of the last sentence from this string and trims again to this occurrence. It returns one or several full sentences as close to specified number of characters.

Please correct my english.


This is what I came up with... you should check if the sentence is longer than the len you are looking for.. among other things like what g13n said. It might be better if the sentence is too short/long to chopping it off and putting "...". Plus, you would have to check/convert whitespace since strrpos will only look for what is given.

$maxlen = 150;
$file = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer malesuada eleifend orci, eget dignissim ligula porttitor cursus. Praesent in blandit enim. Maecenas vitae eleifend est. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Maecenas pulvinar gravida tempor.";
if ( strlen($file) > $maxlen ){
    $file = substr($file,0,strrpos($file,". ",$maxlen-strlen($file))+1);
}

if you want to use the same function you have, you can try this:

function shortenString($string, $your_desired_width) {
  $parts = preg_split('/([\s\n\r]+)/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
  $parts_count = count($parts);

  $length = 0;
  $last_part = 0;
  $last_taken = 0;
  foreach($parts as $part){
    $length += strlen($part);
    if ( $length > $your_desired_width ){
        break;
    }
    ++$last_part;
    if ( $part[strlen($part)-1] == '.' ){
        $last_taken = $last_part;
    }
  }
  return implode(array_slice($parts, 0, $last_taken));
}

You could just use a simple regular expression like /^([^.]*?).*/ and replace that with "$1". Like:

$output = preg_replace('/^([^.]+).*/', '$1.', $input);

That said, you'll have to be aware that not all languages have period (.) as the sentence delimiter.

HTH.