Rolling window REVISITED - Adding window rolling quantity as a parameter- Walk Forward Analysis
So, giving my two cents (with all the help of @Ben.T), here goes the code to create a Walk Forward Analysis basic tool to get a view on how will your model/models perform in a more generalized manner.
Non-anchored WFA
def walkForwardAnal(myArr, windowSize, rollQty):
from numpy.lib.stride_tricks import as_strided
ArrRows, ArrCols = myArr.shape
ArrItems = myArr.itemsize
sliceQtyAndShape = (int((ArrRows - windowSize) / rollQty + 1), windowSize, ArrCols)
print('The final view shape is {}'.format(sliceQtyAndShape))
ArrStrides = (rollQty * ArrCols * ArrItems, ArrCols * ArrItems, ArrItems)
print('The final strides are {}'.format(ArrStrides))
sliceList = list(as_strided(myArr, shape=sliceQtyAndShape, strides=ArrStrides, writeable=False))
return sliceList
wSizeTr = 400
wSizeTe = 100
wSizeTot = wSizeTr + wSizeTe
rQty = 200
sliceListX = wf.walkForwardAnal(X, wSizeTot, rQty)
sliceListY = wf.walkForwardAnal(y, wSizeTot, rQty)
for sliceArrX, sliceArrY in zip(sliceListX, sliceListY):
## Consider having to make a .copy() of each array, so that we don't modify the original one.
# XArr = sliceArrX.copy() and hence, changing Xtrain, Xtest = XArr[...]
# YArr = sliceArrY.copy() and hence, changing Ytrain, Ytest = XArr[...]
Xtrain = sliceArrX[:-wSizeTe,:]
Xtest = sliceArrX[-wSizeTe:,:]
Ytrain = sliceArrY[:-wSizeTe,:]
Ytest = sliceArrY[-wSizeTe:,:]
Anchored WFA
timeSeriesCrossVal = TimeSeriesSplit(n_splits=5)
for trainIndex, testIndex in timeSeriesCrossVal.split(X):
## Check if the training and testing quantities make sense. If not, increase or decrease the n_splits parameter.
Xtrain = X[trainIndex]
Xtest = X[testIndex]
Ytrain = y[trainIndex]
Ytest = y[testIndex]
Then, you could just create the following (in any of the two approaches) and keep modelling:
# Fit on training set only - The targets (y) are already encoded in dummy variables, so no need to standarize them.
scaler = StandardScaler()
scaler.fit(Xtrain)
# Apply transform to both the training set and the test set.
trainX = scaler.transform(Xtrain)
testX = scaler.transform(Xtest)
## PCA - Principal Component Analysis #### APPLY PCA TO THE STANDARIZED TRAINING SET! :::: Fit on training set only.
pca = PCA(.95)
pca.fit(trainX)
# Apply transform to both the training set and the test set.
trainX = pca.transform(trainX)
testX = pca.transform(testX)
## Predict and append predictions...
The one liner for a non-anchored case with generalized window rolling quantity:
sliceListX = [arr[i: i + wSizeTot] for i in range(0, arr.shape[0] - wSizeTot+1, rQty)]
IIUC what you want, you can use np.lib.stride_tricks.as_strided
to create the view of the windows size and the rolling quantity such as:
#redefine arr to see better what is happening than with random numbers
arr = np.arange(30).reshape((10,3))
#get arr properties
arr_0, arr_1 = arr.shape
arr_is = arr.itemsize #the size of element in arr
#parameter window and rolling
win_size = 5
roll_qty = 2
# use as_stribed by defining the right parameters:
from numpy.lib.stride_tricks import as_strided
print (as_strided( arr,
shape=(int((arr_0 - win_size)/roll_qty+1), win_size,arr_1),
strides=(roll_qty*arr_1*arr_is, arr_1*arr_is, arr_is)))
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]],
[[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20]],
[[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
and for another window size and rolling quantity:
win_size = 4
roll_qty = 3
print( as_strided( arr,
shape=(int((arr_0 - win_size)/roll_qty+1), win_size,arr_1),
strides=(roll_qty*arr_1*arr_is, arr_1*arr_is, arr_is)))
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]]])