Substitute the ^ (power) symbol with C's pow syntax in mathematical expression
One of the most fantastic things about R is that you can easily manipulate R expressions with R. Here, we recursively traverse your expression and replace all instances of ^
with pow
:
f <- function(x) {
if(is.call(x)) {
if(identical(x[[1L]], as.name("^"))) x[[1L]] <- as.name("pow")
if(length(x) > 1L) x[2L:length(x)] <- lapply(x[2L:length(x)], f)
}
x
}
f(quote(((2-x+3)^2+(x-5+7)^10)^0.5))
## pow((pow((2 - x + 3), 2) + pow((x - 5 + 7), 10)), 0.5)
This should be more robust than the regex since you are relying on the natural interpretation of the R language rather than on text patterns that may or may not be comprehensive.
Details: Calls in R are stored in list like structures with the function / operator at the head of the list, and the arguments in following elements. For example, consider:
exp <- quote(x ^ 2)
exp
## x^2
is.call(exp)
## [1] TRUE
We can examine the underlying structure of the call with as.list
:
str(as.list(exp))
## List of 3
## $ : symbol ^
## $ : symbol x
## $ : num 2
As you can see, the first element is the function/operator, and subsequent elements are the arguments to the function.
So, in our recursive function, we:
- Check if an object is a call
- If yes: check if it is a call to the
^
function/operator by looking at the first element in the call withidentical(x[[1L]], as.name("^")
- If yes: replace the first element with
as.name("pow")
- Then, irrespective of whether this was a call to
^
or anything else:- if the call has additional elements, cycle through them and apply this function (i.e. recurse) to each element, replacing the result back into the original call (
x[2L:length(x)] <- lapply(x[2L:length(x)], f)
)
- if the call has additional elements, cycle through them and apply this function (i.e. recurse) to each element, replacing the result back into the original call (
- If yes: replace the first element with
- If no: just return the object unchanged
- If yes: check if it is a call to the
Note that calls often contain the names of functions as the first element. You can create those names with as.name
. Names are also referenced as "symbols" in R (hence the output of str
).
DISCLAIMER: The answer was written with the OP original regex in mind, when the question sounded as "process the ^
preceded with balanced (nested) parentheses". Please do not use this solution for generic math expression parsing, only for educational purposes and only when you really need to process some text in the balanced parentheses context.
Since a PCRE regex can match nested parentheses, it is possible to achieve in R with a mere regex in a while
loop checking the presence of ^
in the modified string with x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
. Once there is no ^
, there is nothing else to substitute.
The regex pattern is
(\(((?:[^()]++|(?1))*)\))\^(\d*\.?\d+)
See the regex demo
Details:
(\(((?:[^()]++|(?1))*)\))
- Group 1: a(...)
substring with balanced parentheses capturing what is inside the outer parentheses into Group 2 (with((?:[^()]++|(?1))*)
subpattern) (explanation can be found at How can I match nested brackets using regex?), in short,\
matches an outer(
, then(?:[^()]++|(?1))*
matches zero or more sequences of 1+ chars other than(
and)
or the whole Group 1 subpattern ((?1)
is a subroutine call) and then a)
)\^
- a^
caret(\d*\.?\d+)
- Group 3: an int/float number (.5
,1.5
,345
)
The replacement pattern contains a literal pow()
and the \\2
and \\3
are backreferences to the substrings captured with Group 2 and 3.
R code:
v <- "((2-x+3)^2+(x-5+7)^10)^0.5"
x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
while(x) {
v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", "pow(\\2, \\3)", v, perl=TRUE);
x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
}
v
## => [1] "pow(pow(2-x+3, 2)+pow(x-5+7, 10), 0.5)"
And to support ^(x-3)
pow
s, you may use
v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(?|()(\\d*\\.?\\d+)|(\\(((?:[^()]++|(?3))*)\\)))", "pow(\\2, \\4)", v, perl=TRUE);
and to check if there are any more values to replace:
x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(?|()(\\d*\\.?\\d+)|(\\(((?:[^()]++|(?3))*)\\)))", v, perl=TRUE)
Here is a solution that follows the parse tree recursively and replaces ^
:
#parse the expression
#alternatively you could create it with
#expression(((2-x+3)^2+(x-5+7)^10)^0.5)
e <- parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5")
#a recursive function
fun <- function(e) {
#check if you are at the end of the tree's branch
if (is.name(e) || is.atomic(e)) {
#replace ^
if (e == quote(`^`)) return(quote(pow))
return(e)
}
#follow the tree with recursion
for (i in seq_along(e)) e[[i]] <- fun(e[[i]])
return(e)
}
#deparse to get a character string
deparse(fun(e)[[1]])
#[1] "pow((pow((2 - x + 3), 2) + pow((x - 5 + 7), 10)), 0.5)"
This would be much easier if rapply
worked with expressions/calls.
Edit:
OP has asked regarding performance. It is very unlikely that performance is an issue for this task, but the regex solution is not faster.
library(microbenchmark)
microbenchmark(regex = {
v <- "((2-x+3)^2+(x-5+7)^10)^0.5"
x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
while(x) {
v <- sub("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", "pow(\\2, \\3)", v, perl=TRUE);
x <- grepl("(\\(((?:[^()]++|(?1))*)\\))\\^(\\d*\\.?\\d+)", v, perl=TRUE)
}
},
BrodieG = {
deparse(f(parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5")[[1]]))
},
Roland = {
deparse(fun(parse(text = "((2-x+3)^2+(x-5+7)^10)^0.5"))[[1]])
})
#Unit: microseconds
# expr min lq mean median uq max neval cld
# regex 321.629 323.934 335.6261 335.329 337.634 384.623 100 c
# BrodieG 238.405 246.087 255.5927 252.105 257.227 355.943 100 b
# Roland 211.518 225.089 231.7061 228.802 235.204 385.904 100 a
I haven't included the solution provided by @digEmAll, because it seems obvious that a solution with that many data.frame operations will be relatively slow.
Edit2:
Here is a version that also handles sqrt
.
fun <- function(e) {
#check if you are at the end of the tree's branch
if (is.name(e) || is.atomic(e)) {
#replace ^
if (e == quote(`^`)) return(quote(pow))
return(e)
}
if (e[[1]] == quote(sqrt)) {
#replace sqrt
e[[1]] <- quote(pow)
#add the second argument
e[[3]] <- quote(0.5)
}
#follow the tree with recursion
for (i in seq_along(e)) e[[i]] <- fun(e[[i]])
return(e)
}
e <- parse(text = "sqrt((2-x+3)^2+(x-5+7)^10)")
deparse(fun(e)[[1]])
#[1] "pow(pow((2 - x + 3), 2) + pow((x - 5 + 7), 10), 0.5)"