Block bootstrap from subject list

You can also use the tsboot function in the boot package with fixed block resampling scheme.

require(plm)
require(boot)
data(Grunfeld)

### each firm is of length 20
table(Grunfeld$firm)
##  1  2  3  4  5  6  7  8  9 10 
## 20 20 20 20 20 20 20 20 20 20


blockboot <- function(data) 
{
 coefficients(lm(value ~ inv + capital, data = data))

}

### fixed length (every 20 obs, so for each different firm) block bootstrap
set.seed(321)
boot.1 <- tsboot(Grunfeld, blockboot, R = 99, l = 20, sim = "fixed")

boot.1    
## Bootstrap Statistics :
##      original     bias    std. error
## t1* 410.81557 -25.785972    174.3766
## t2*   5.75981   0.451810      2.0261
## t3*  -0.61527   0.065322      0.6330

dim(boot.1$t)
## [1] 99  3

head(boot.1$t)
##        [,1]   [,2]      [,3]
## [1,] 522.11 7.2342 -1.453204
## [2,] 626.88 4.6283  0.031324
## [3,] 479.74 3.2531  0.637298
## [4,] 557.79 4.5284  0.161462
## [5,] 568.72 5.4613 -0.875126
## [6,] 379.04 7.0707 -1.092860

How about something like this:

myfit <- function(x, i) {
   mydata <- do.call("rbind", lapply(i, function(n) subset(Grunfeld, firm==x[n])))
   coefficients(lm(value ~ inv + capital, data = mydata))
}

firms <- unique(Grunfeld$firm)

b0 <- boot(firms, myfit, 999)

Here is a method that should typically be faster than the accepted answer, returns the same results and does not rely on additional packages (except boot). The key here is to use which and integer indexing to construct each data.frame replicate rather than split/subset and do.call/rbind.

# get function for boot
myIndex <- function(x, i) {
  # select the observations to subset. Likely repeated observations
  blockObs <- unlist(lapply(i, function(n) which(x[n] == Grunfeld$firm)))
  # run regression for given replicate, return estimated coefficients
  coefficients(lm(value~ inv + capital, data=Grunfeld[blockObs,]))
}

now, bootstrap

# get result
library(boot)
set.seed(1234)
b1 <- boot(firms, myIndex, 200)

Run the accepted answer

set.seed(1234)
b0 <- boot(firms, myfit, 200)

Let's eyeball a comparison

using indexing

b1

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = firms, statistic = myIndex, R = 200)


Bootstrap Statistics :
       original      bias    std. error
t1* 410.8155650 -6.64885086 197.3147581
t2*   5.7598070  0.37922066   2.4966872
t3*  -0.6152727 -0.04468225   0.8351341

Original version

b0

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = firms, statistic = myfit, R = 200)


Bootstrap Statistics :
       original      bias    std. error
t1* 410.8155650 -6.64885086 197.3147581
t2*   5.7598070  0.37922066   2.4966872
t3*  -0.6152727 -0.04468225   0.8351341

These look pretty close. Now, a bit more checking

identical(b0$t, b1$t)
[1] TRUE

and

identical(summary(b0), summary(b1))
[1] TRUE

Finally, we'll do a quick benchmark

library(microbenchmark)
microbenchmark(index={b1 <- boot(firms, myIndex, 200)}, 
              rbind={b0 <- boot(firms, myfit, 200)})

On my computer, this returns

Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
 index 292.5770 296.3426 303.5444 298.4836 301.1119 395.1866   100
 rbind 712.1616 720.0428 729.6644 724.0777 731.0697 833.5759   100

So, direct indexing is more than 2 times faster at every level of the distribution.

note on missing fixed effects
As with most of the answers, the issue of missing "fixed effects" may emerge. Commonly, fixed effects are used as controls and the researcher is interested in one or a couple of variables that will be included with every selected observation. In this dominant case, there is no (or very little) harm in restricting the returned result of the myIndex or myfit function to only include the variables of interest in the returned vector.

Block bootstrap from subject list

Tags:

R

Plyr

Regression

Statistics Bootstrap

Related

Recent Posts