How to extract decimal number from string in C#
Small improvement to @Michael's solution:
// NOTES: about the LINQ:
// .Where() == filters the IEnumerable (which the array is)
// (c=>...) is the lambda for dealing with each element of the array
// where c is an array element.
// .Trim() == trims all blank spaces at the start and end of the string
var doubleArray = Regex.Split(sentence, @"[^0-9\.]+")
.Where(c => c != "." && c.Trim() != "");
Returns:
10.4
20.5
40
1
The original solution was returning
[empty line here]
10.4
20.5
40
1
.
The decimal/float number extraction regex can be different depending on whether and what thousand separators are used, what symbol denotes a decimal separator, whether one wants to also match an exponent, whether or not to match a positive or negative sign, whether or not to match numbers that may have leading 0
omitted, whether or not extract a number that ends with a decimal separator.
A generic regex to match the most common decimal number types is provided in Matching Floating Point Numbers with a Regular Expression:
[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?
I only changed the capturing group to a non-capturing one (added ?:
after (
). It matches
If you need to make it even more generic, if the decimal separator can be either a dot or a comma, replace \.
with a character class (or a bracket expression) [.,]
:
[-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?
^^^^
Note the expressions above match both integer and floats. To match only float/decimal numbers make sure the fractional pattern part is obligatory by removing the second ?
after \.
(demo):
[-+]?[0-9]*\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
Now, 34
is not matched: is matched.
If you do not want to match float numbers without leading zeros (like .5
) make the first digit matching pattern obligatory (by adding +
quantifier, to match 1 or more occurrences of digits):
[-+]?[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
See this demo. Now, it matches much fewer samples:
Now, what if you do not want to match <digits>.<digits>
inside <digits>.<digits>.<digits>.<digits>
? How to match them as whole words? Use lookarounds:
[-+]?(?<!\d\.)\b[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.\d)
And a demo here:
Now, what about those floats that have thousand separators, like 12 123 456.23
or 34,345,767.678
? You may add (?:[,\s][0-9]+)*
after the first [0-9]+
to match zero or more sequences of a comma or whitespace followed with 1+ digits:
[-+]?(?<![0-9]\.)\b[0-9]+(?:[,\s][0-9]+)*\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.[0-9])
See the regex demo:
Swap a comma with \.
if you need to use a comma as a decimal separator and a period as as thousand separator.
Now, how to use these patterns in C#?
var results = Regex.Matches(input, @"<PATTERN_HERE>")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
try
Regex.Split (sentence, @"[^0-9\.]+")