Split a string that has white spaces, unless they are enclosed within "quotes"?
As custom parser might be more suitable for this.
This is something I wrote once when I had a specific (and very strange) parsing requirement that involved parenthesis and spaces, but it is generic enough that it should work with virtually any delimiter and text qualifier.
public static IEnumerable<String> ParseText(String line, Char delimiter, Char textQualifier)
{
if (line == null)
yield break;
else
{
Char prevChar = '\0';
Char nextChar = '\0';
Char currentChar = '\0';
Boolean inString = false;
StringBuilder token = new StringBuilder();
for (int i = 0; i < line.Length; i++)
{
currentChar = line[i];
if (i > 0)
prevChar = line[i - 1];
else
prevChar = '\0';
if (i + 1 < line.Length)
nextChar = line[i + 1];
else
nextChar = '\0';
if (currentChar == textQualifier && (prevChar == '\0' || prevChar == delimiter) && !inString)
{
inString = true;
continue;
}
if (currentChar == textQualifier && (nextChar == '\0' || nextChar == delimiter) && inString)
{
inString = false;
continue;
}
if (currentChar == delimiter && !inString)
{
yield return token.ToString();
token = token.Remove(0, token.Length);
continue;
}
token = token.Append(currentChar);
}
yield return token.ToString();
}
}
The usage would be:
var parsedText = ParseText(streamR, ' ', '"');
string input = "one \"two two\" three \"four four\" five six";
var parts = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
You can use the TextFieldParser class that is part of the Microsoft.VisualBasic.FileIO
namespace. (You'll need to add a reference to Microsoft.VisualBasic
to your project.):
string inputString = "This is \"a test\" of the parser.";
using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(inputString)))
{
using (Microsoft.VisualBasic.FileIO.TextFieldParser tfp = new TextFieldParser(ms))
{
tfp.Delimiters = new string[] { " " };
tfp.HasFieldsEnclosedInQuotes = true;
string[] output = tfp.ReadFields();
for (int i = 0; i < output.Length; i++)
{
Console.WriteLine("{0}:{1}", i, output[i]);
}
}
}
Which generates the output:
0:This
1:is
2:a test
3:of
4:the
5:parser.
You can even do that without Regex: a LINQ expression with String.Split
can do the job.
You can split your string before by "
then split only the elements with even index in the resulting array by .
var result = myString.Split('"')
.Select((element, index) => index % 2 == 0 // If even index
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) // Split the item
: new string[] { element }) // Keep the entire item
.SelectMany(element => element).ToList();
For the string:
This is a test for "Splitting a string" that has white spaces, unless they are "enclosed within quotes"
It gives the result:
This
is
a
test
for
Splitting a string
that
has
white
spaces,
unless
they
are
enclosed within quotes
UPDATE
string myString = "WordOne \"Word Two\"";
var result = myString.Split('"')
.Select((element, index) => index % 2 == 0 // If even index
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) // Split the item
: new string[] { element }) // Keep the entire item
.SelectMany(element => element).ToList();
Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();
UPDATE 2
How do you define a quoted portion of the string?
We will assume that the string before the first "
is non-quoted.
Then, the string placed between the first "
and before the second "
is quoted. The string between the second "
and the third "
is non-quoted. The string between the third and the fourth is quoted, ...
The general rule is: Each string between the (2*n-1)th (odd number) "
and (2*n)th (even number) "
is quoted. (1)
What is the relation with String.Split
?
String.Split with the default StringSplitOption (define as StringSplitOption.None) creates an list of 1 string and then add a new string in the list for each splitting character found.
So, before the first "
, the string is at index 0 in the splitted array, between the first and second "
, the string is at index 1 in the array, between the third and fourth, index 2, ...
The general rule is: The string between the nth and (n+1)th "
is at index n in the array. (2)
The given (1)
and (2)
, we can conclude that: Quoted portion are at odd index in the splitted array.