Test for numeric elements in a character string

I recently encountered a similar problem where I was trying to write a function to format values passed as a character string from another function. The formatted values would ultimately end up in a table and I wanted to create logic to identify NA, character strings, and character representations of numbers so that I could apply sprintf() on them before generating the table.

Although more complicated to read, I do like the robustness of the grepl() approach. I think this gets all of the examples brought up in the comments.

x <- c("0",37,"42","-5","-2.3","1.36e4","4L","La","ti","da",NA)

y <- grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)

This would be evaluate to (formatted to help with visualization):

x
[1] "0"  "37"   "42"  "-5"   "-2.3"   "1.36e4" "4L" "La"     "ti"     "da"     NA 

y
[1] TRUE  TRUE   TRUE  TRUE   TRUE     TRUE    TRUE FALSE   FALSE    FALSE    FALSE

The regular expression is TRUE for:

  • positive or negative numbers with no more than one decimal OR
  • positive or negative integers (e.g., 4L) OR
  • positive or negative numbers in scientific notation

Additional terms could be added to handle decimals without a leading digit or numbers with a decimal point but not digits after the decimal if the dataset contained numbers in poor form.


Maybe there's a reason some other pieces of your data are more complicated that would break this, but my first thought is:

> !is.na(as.numeric(x))
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE

As noted below by Josh O'Brien this won't pick up things like 7L, which the R interpreter would parse as the integer 7. If you needed to include those as "plausibly numeric" one route would be to pick them out with a regex first,

x <- c("1.2","1e4","1.2.3","5L")
> x
[1] "1.2"   "1e4"   "1.2.3" "5L"   
> grepl("^[[:digit:]]+L",x)
[1] FALSE FALSE FALSE  TRUE

...and then strip the "L" from just those elements using gsub and indexing.


Avoid re-inventing the wheel with check.numeric() from package varhandle.

The function accepts the following arguments:

v The character vector or factor vector. (Mandatory)

na.rm logical. Should the function ignore NA? Default value is FLASE since NA can be converted to numeric. (Optional)

only.integer logical. Only check for integers and do not accept floating point. Default value is FALSE. (Optional)

exceptions A character vector containing the strings that should be considered as valid to be converted to numeric. (Optional)

ignore.whitespace logical. Ignore leading and tailing whitespace characters before assessing if the vector can be converted to numeric. Default value is TRUE. (Optional)

Tags:

Regex

R