converting daily stock data to weekly-based via pandas in Python
You can resample
(to weekly), offset
(shift), and apply
aggregation rules as follows:
logic = {'Open' : 'first',
'High' : 'max',
'Low' : 'min',
'Close' : 'last',
'Volume': 'sum'}
offset = pd.offsets.timedelta(days=-6)
f = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
f.resample('W', loffset=offset).apply(logic)
to get:
Open High Low Close Volume
Date
2010-01-04 38.660000 40.700001 38.509998 40.290001 5925600
2010-01-11 40.209999 40.970001 39.279999 40.450001 6234600
In general, assuming that you have the dataframe in the form you specified, you need to do the following steps:
- put
Date
in the index resample
the index.
What you have is a case of applying different functions to different columns. See.
You can resample in various ways. for e.g. you can take the mean of the values or count or so on. check pandas resample.
You can also apply custom aggregators (check the same link). With that in mind, the code snippet for your case can be given as:
f['Date'] = pd.to_datetime(f['Date'])
f.set_index('Date', inplace=True)
f.sort_index(inplace=True)
def take_first(array_like):
return array_like[0]
def take_last(array_like):
return array_like[-1]
output = f.resample('W', # Weekly resample
how={'Open': take_first,
'High': 'max',
'Low': 'min',
'Close': take_last,
'Volume': 'sum'},
loffset=pd.offsets.timedelta(days=-6)) # to put the labels to Monday
output = output[['Open', 'High', 'Low', 'Close', 'Volume']]
Here, W
signifies a weekly resampling which by default spans from Monday to Sunday. To keep the labels as Monday, loffset
is used.
There are several predefined day specifiers. Take a look at pandas offsets. You can even define custom offsets (see).
Coming back to the resampling method. Here for Open
and Close
you can specify custom methods to take the first value or so on and pass the function handle to the how
argument.
This answer is based on the assumption that the data seems to be daily, i.e. for each day you have only 1 entry. Also, no data is present for the non-business days. i.e. Sat and Sun. So taking the last data point for the week as the one for Friday is ok. If you so want you can use business week instead of 'W'. Also, for more complex data you may want to use groupby
to group the weekly data and then work on the time indices within them.
btw a gist for the solution can be found at: https://gist.github.com/prithwi/339f87bf9c3c37bb3188