Best way to split string into lines with maximum length, without breaking words
Even when this post is 3 years old I wanted to give a better solution using Regex
to accomplish the same:
If you want the string to be splitted and then use the text to be displayed you can use this:
public string SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Replace(stringToSplit, @"(.{1," + maximumLineLength +@"})(?:\s|$)", "$1\n");
}
If on the other hand you need a collection you can use this:
public MatchCollection SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Matches(stringToSplit, @"(.{1," + maximumLineLength +@"})(?:\s|$)");
}
NOTES
Remember to import regex (using System.Text.RegularExpressions;
)
You can use string interpolation on the match:$@"(.{{1,{maximumLineLength}}})(?:\s|$)"
The MatchCollection
works almost like an Array
Matching example with explanation here
How about this as a solution:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}
I reworked this as prefer this:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}
I don't think your solution is too bad. I do, however, think you should break up your ternary into an if else because you are testing the same condition twice. Your code might also have a bug. Based on your description, it seems you want lines <= maxLineLength, but your code counts the space after the last word and uses it in the <= comparison resulting in effectively < behavior for the trimmed string.
Here is my solution.
private static IEnumerable<string> SplitToLines(string stringToSplit, int maxLineLength)
{
string[] words = stringToSplit.Split(' ');
StringBuilder line = new StringBuilder();
foreach (string word in words)
{
if (word.Length + line.Length <= maxLineLength)
{
line.Append(word + " ");
}
else
{
if (line.Length > 0)
{
yield return line.ToString().Trim();
line.Clear();
}
string overflow = word;
while (overflow.Length > maxLineLength)
{
yield return overflow.Substring(0, maxLineLength);
overflow = overflow.Substring(maxLineLength);
}
line.Append(overflow + " ");
}
}
yield return line.ToString().Trim();
}
It is a bit longer than your solution, but it should be more straightforward. It also uses a StringBuilder so it is much faster for large strings. I performed a benchmarking test for 20,000 words ranging from 1 to 11 characters each split into lines of 10 character width. My method completed in 14ms compared to 1373ms for your method.