What are the benefits of defining and calling a function inside another function in R?

Benefits of defining f2 inside f1:

  • f2 only visible within f1, useful if f2 is only meant for use within f1, though within package namespaces this is debatable since you just wouldn't export f2 if you defined it outside
  • f2 has access to variables within f1, which could be considered a good or a bad thing:
    • good, because you don't have to pass variables through the function interface and you can use <<- to implement stuff like memoization, etc.
    • bad, for the same reasons...

Disadvantages:

  • f2 needs to be redefined every time you call f1, which adds some overhead (not very much overhead, but definitely there)

Data size should not matter since R won't copy the data unless it is being modified under either scenario. As noted in disadvantages, defining f2 outside of f1 should be a little faster, especially if you are repeating an otherwise relatively low overhead operation many times. Here is an example:

> fun1 <- function(x) {
+   fun2 <- function(x) x
+   fun2(x)
+ }
> fun2a <- function(x) x
> fun3 <- function(x) fun2a(x)
> 
> library(microbenchmark)
> microbenchmark(
+   fun1(TRUE), fun3(TRUE)
+ )
Unit: nanoseconds
       expr min    lq median    uq   max neval
 fun1(TRUE) 656 674.5  728.5 859.5 17394   100
 fun3(TRUE) 406 434.5  480.5 563.5  1855   100

In this case we save 250ns (edit: the difference is actually 200ns; believe it or not the extra set of {} that fun1 has costs another 50ns). Not much, but can add up if the interior function is more complex or you repeat the function many many times.


You would typically use approach 2. Some exceptions are

  1. Function closures:

    f = function() {
        counter = 1
        g = function() {
            counter <<- counter + 1
            return(counter)
        }
     }
     counter = f()
     counter()
     counter()
    

    Function closure enable us to remember the state.

  2. Sometimes it's handy to only define functions as they are only used in one place. For example, when using optim, we often tweak an existing function. For example,

    pdf = function(x, mu) dnorm(x, mu, log=TRUE)
    f = function(d, lower, initial=0) {
      ll = function(mu) {
        if(mu < lower) return(-Inf)
        else -sum(pdf(d, mu))
      }
      optim(initial, ll)
    }
    
    f(d, 1.5)
    

    The ll function uses the data set d and a lower bound. This is both convenient since this may be the only time we use/need the ll function.