Explicitly calling return in a function or not
If everyone agrees that
return
is not necessary at the end of a function's body- not using
return
is marginally faster (according to @Alan's test, 4.3 microseconds versus 5.1)
should we all stop using return
at the end of a function? I certainly won't, and I'd like to explain why. I hope to hear if other people share my opinion. And I apologize if it is not a straight answer to the OP, but more like a long subjective comment.
My main problem with not using return
is that, as Paul pointed out, there are other places in a function's body where you may need it. And if you are forced to use return
somewhere in the middle of your function, why not make all return
statements explicit? I hate being inconsistent. Also I think the code reads better; one can scan the function and easily see all exit points and values.
Paul used this example:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
Unfortunately, one could point out that it can easily be rewritten as:
foo = function() {
if(a) {
output <- a
} else {
output <- b
}
output
}
The latter version even conforms with some programming coding standards that advocate one return statement per function. I think a better example could have been:
bar <- function() {
while (a) {
do_stuff
for (b) {
do_stuff
if (c) return(1)
for (d) {
do_stuff
if (e) return(2)
}
}
}
return(3)
}
This would be much harder to rewrite using a single return statement: it would need multiple break
s and an intricate system of boolean variables for propagating them. All this to say that the single return rule does not play well with R. So if you are going to need to use return
in some places of your function's body, why not be consistent and use it everywhere?
I don't think the speed argument is a valid one. A 0.8 microsecond difference is nothing when you start looking at functions that actually do something. The last thing I can see is that it is less typing but hey, I'm not lazy.
Question was: Why is not (explicitly) calling return faster or better, and thus preferable?
There is no statement in R documentation making such an assumption.
The main page ?'function' says:
function( arglist ) expr
return(value)
Is it faster without calling return?
Both function()
and return()
are primitive functions and the function()
itself returns last evaluated value even without including return()
function.
Calling return()
as .Primitive('return')
with that last value as an argument will do the same job but needs one call more. So that this (often) unnecessary .Primitive('return')
call can draw additional resources.
Simple measurement however shows that the resulting difference is very small and thus can not be the reason for not using explicit return. The following plot is created from data selected this way:
bench_nor2 <- function(x,repeats) { system.time(rep(
# without explicit return
(function(x) vector(length=x,mode="numeric"))(x)
,repeats)) }
bench_ret2 <- function(x,repeats) { system.time(rep(
# with explicit return
(function(x) return(vector(length=x,mode="numeric")))(x)
,repeats)) }
maxlen <- 1000
reps <- 10000
along <- seq(from=1,to=maxlen,by=5)
ret <- sapply(along,FUN=bench_ret2,repeats=reps)
nor <- sapply(along,FUN=bench_nor2,repeats=reps)
res <- data.frame(N=along,ELAPSED_RET=ret["elapsed",],ELAPSED_NOR=nor["elapsed",])
# res object is then visualized
# R version 2.15
The picture above may slightly difffer on your platform. Based on measured data, the size of returned object is not causing any difference, the number of repeats (even if scaled up) makes just a very small difference, which in real word with real data and real algorithm could not be counted or make your script run faster.
Is it better without calling return?
Return
is good tool for clearly designing "leaves" of code where the routine should end, jump out of the function and return value.
# here without calling .Primitive('return')
> (function() {10;20;30;40})()
[1] 40
# here with .Primitive('return')
> (function() {10;20;30;40;return(40)})()
[1] 40
# here return terminates flow
> (function() {10;20;return();30;40})()
NULL
> (function() {10;20;return(25);30;40})()
[1] 25
>
It depends on strategy and programming style of the programmer what style he use, he can use no return() as it is not required.
R core programmers uses both approaches ie. with and without explicit return() as it is possible to find in sources of 'base' functions.
Many times only return() is used (no argument) returning NULL in cases to conditially stop the function.
It is not clear if it is better or not as standard user or analyst using R can not see the real difference.
My opinion is that the question should be: Is there any danger in using explicit return coming from R implementation?
Or, maybe better, user writing function code should always ask: What is the effect in not using explicit return (or placing object to be returned as last leaf of code branch) in the function code?
This is an interesting discussion. I think that @flodel's example is excellent. However, I think it illustrates my point (and @koshke mentions this in a comment) that return
makes sense when you use an imperative instead of a functional coding style.
Not to belabour the point, but I would have rewritten foo
like this:
foo = function() ifelse(a,a,b)
A functional style avoids state changes, like storing the value of output
. In this style, return
is out of place; foo
looks more like a mathematical function.
I agree with @flodel: using an intricate system of boolean variables in bar
would be less clear, and pointless when you have return
. What makes bar
so amenable to return
statements is that it is written in an imperative style. Indeed, the boolean variables represent the "state" changes avoided in a functional style.
It is really difficult to rewrite bar
in functional style, because it is just pseudocode, but the idea is something like this:
e_func <- function() do_stuff
d_func <- function() ifelse(any(sapply(seq(d),e_func)),2,3)
b_func <- function() {
do_stuff
ifelse(c,1,sapply(seq(b),d_func))
}
bar <- function () {
do_stuff
sapply(seq(a),b_func) # Not exactly correct, but illustrates the idea.
}
The while
loop would be the most difficult to rewrite, because it is controlled by state changes to a
.
The speed loss caused by a call to return
is negligible, but the efficiency gained by avoiding return
and rewriting in a functional style is often enormous. Telling new users to stop using return
probably won't help, but guiding them to a functional style will payoff.
@Paul return
is necessary in imperative style because you often want to exit the function at different points in a loop. A functional style doesn't use loops, and therefore doesn't need return
. In a purely functional style, the final call is almost always the desired return value.
In Python, functions require a return
statement. However, if you programmed your function in a functional style, you will likely have only one return
statement: at the end of your function.
Using an example from another StackOverflow post, let us say we wanted a function that returned TRUE
if all the values in a given x
had an odd length. We could use two styles:
# Procedural / Imperative
allOdd = function(x) {
for (i in x) if (length(i) %% 2 == 0) return (FALSE)
return (TRUE)
}
# Functional
allOdd = function(x)
all(length(x) %% 2 == 1)
In a functional style, the value to be returned naturally falls at the ends of the function. Again, it looks more like a mathematical function.
@GSee The warnings outlined in ?ifelse
are definitely interesting, but I don't think they are trying to dissuade use of the function. In fact, ifelse
has the advantage of automatically vectorizing functions. For example, consider a slightly modified version of foo
:
foo = function(a) { # Note that it now has an argument
if(a) {
return(a)
} else {
return(b)
}
}
This function works fine when length(a)
is 1. But if you rewrote foo
with an ifelse
foo = function (a) ifelse(a,a,b)
Now foo
works on any length of a
. In fact, it would even work when a
is a matrix. Returning a value the same shape as test
is a feature that helps with vectorization, not a problem.
It seems that without return()
it's faster...
library(rbenchmark)
x <- 1
foo <- function(value) {
return(value)
}
fuu <- function(value) {
value
}
benchmark(foo(x),fuu(x),replications=1e7)
test replications elapsed relative user.self sys.self user.child sys.child
1 foo(x) 10000000 51.36 1.185322 51.11 0.11 0 0
2 fuu(x) 10000000 43.33 1.000000 42.97 0.05 0 0
____EDIT __________________
I proceed to others benchmark (benchmark(fuu(x),foo(x),replications=1e7)
) and the result is reversed... I'll try on a server.