Trim a string based on the string length
StringUtils.abbreviate
from Apache Commons Lang library could be your friend:
StringUtils.abbreviate("abcdefg", 6) = "abc..."
StringUtils.abbreviate("abcdefg", 7) = "abcdefg"
StringUtils.abbreviate("abcdefg", 8) = "abcdefg"
StringUtils.abbreviate("abcdefg", 4) = "a..."
Commons Lang3 even allow to set a custom String as replacement marker. With this you can for example set a single character ellipsis.
StringUtils.abbreviate("abcdefg", "\u2026", 6) = "abcde…"
As usual nobody cares about UTF-16 surrogate pairs. See about them: What are the most common non-BMP Unicode characters in actual use? Even authors of org.apache.commons/commons-lang3
You can see difference between correct code and usual code in this sample:
public static void main(String[] args) {
//string with FACE WITH TEARS OF JOY symbol
String s = "abcdafghi\uD83D\uDE02cdefg";
int maxWidth = 10;
System.out.println(s);
//do not care about UTF-16 surrogate pairs
System.out.println(s.substring(0, Math.min(s.length(), maxWidth)));
//correctly process UTF-16 surrogate pairs
if(s.length()>maxWidth){
int correctedMaxWidth = (Character.isLowSurrogate(s.charAt(maxWidth)))&&maxWidth>0 ? maxWidth-1 : maxWidth;
System.out.println(s.substring(0, Math.min(s.length(), correctedMaxWidth)));
}
}
There is a Apache Commons StringUtils
function which does this.
s = StringUtils.left(s, 10)
If len characters are not available, or the String is null, the String will be returned without an exception. An empty String is returned if len is negative.
StringUtils.left(null, ) = null
StringUtils.left(, -ve) = ""
StringUtils.left("", *) = ""
StringUtils.left("abc", 0) = ""
StringUtils.left("abc", 2) = "ab"
StringUtils.left("abc", 4) = "abc"
StringUtils.Left JavaDocs
Courtesy:Steeve McCauley
s = s.substring(0, Math.min(s.length(), 10));
Using Math.min
like this avoids an exception in the case where the string is already shorter than 10
.
Notes:
The above does simple trimming. If you actually want to replace the last characters with three dots if the string is too long, use Apache Commons
StringUtils.abbreviate
; see @H6's solution. If you want to use the Unicode horizontal ellipsis character, see @Basil's solution.For typical implementations of
String
,s.substring(0, s.length())
will returns
rather than allocating a newString
.This may behave incorrectly1 if your String contains Unicode codepoints outside of the BMP; e.g. Emojis. For a (more complicated) solution that works correctly for all Unicode code-points, see @sibnick's solution.
1 - A Unicode codepoint that is not on plane 0 (the BMP) is represented as a "surrogate pair" (i.e. two char
values) in the String
. By ignoring this, we might trim the string to fewer than 10 code points, or (worse) truncate it in the middle of a surrogate pair. On the other hand, String.length()
is not a good measure of Unicode text length, so trimming based on that property may be the wrong thing to do.