Converting OHLC stock data into a different timeframe with python and pandas

With a more recent version of Pandas, there is a resample method. It is very fast and is useful to accomplish the same task:

ohlc_dict = {                                                                                                             
    'Open': 'first',                                                                                                    
    'High': 'max',                                                                                                       
    'Low': 'min',                                                                                                        
    'Close': 'last',                                                                                                    
    'Volume': 'sum',
}

df.resample('5T', closed='left', label='left').apply(ohlc_dict)

Your approach is sound, but fails because each function in the dict-of-functions applied to agg() receives a Series object reflecting the column matched by the key value. Therefore, it's not necessary to filter on column label again. With this, and assuming groupby preserves order, you can slice the Series to extract the first/last element of the Open/Close columns (note: groupby documentation does not claim to preserve order of original data series, but seems to in practice.)

In [50]: df.groupby(dr5minute.asof).agg({'Low': lambda s: s.min(), 
                                         'High': lambda s: s.max(),
                                         'Open': lambda s: s[0],
                                         'Close': lambda s: s[-1],
                                         'Volume': lambda s: s.sum()})
Out[50]: 
                      Close    High     Low    Open  Volume
key_0                                                      
1999-01-04 10:20:00  1.1806  1.1819  1.1801  1.1801      34
1999-01-04 10:25:00  1.1789  1.1815  1.1776  1.1807      91
1999-01-04 10:30:00  1.1791  1.1792  1.1776  1.1780      16

For reference, here is a table to summarize the expected input and output types of an aggregation function based on the groupby object type and how the aggregation function(s) is/are passed to agg().

                  agg() method     agg func    agg func          agg()
                  input type       accepts     returns           result
GroupBy Object
SeriesGroupBy     function         Series      value             Series
                  dict-of-funcs    Series      value             DataFrame, columns match dict keys
                  list-of-funcs    Series      value             DataFrame, columns match func names
DataFrameGroupBy  function         DataFrame   Series/dict/ary   DataFrame, columns match original DataFrame
                  dict-of-funcs    Series      value             DataFrame, columns match dict keys, where dict keys must be columns in original DataFrame
                  list-of-funcs    Series      value             DataFrame, MultiIndex columns (original cols x func names)

From the above table, if aggregation requires access to more than one column, the only option is to pass a single function to a DataFrameGroupBy object. Therefore, an alternate way to accomplish the original task is to define a function like the following:

def ohlcsum(df):
    df = df.sort()
    return {
       'Open': df['Open'][0],
       'High': df['High'].max(),
       'Low': df['Low'].min(),
       'Close': df['Close'][-1],
       'Volume': df['Volume'].sum()
      }

and apply agg() with it:

In [30]: df.groupby(dr5minute.asof).agg(ohlcsum)
Out[30]: 
                       Open    High     Low   Close  Volume
key_0                                                      
1999-01-04 10:20:00  1.1801  1.1819  1.1801  1.1806      34
1999-01-04 10:25:00  1.1807  1.1815  1.1776  1.1789      91
1999-01-04 10:30:00  1.1780  1.1792  1.1776  1.1791      16

Though pandas may offer some cleaner built-in magic in the future, hopefully this explains how to work with today's agg() capabilities.


df = df.resample('4h').agg({
    'open': lambda s: s[0],
    'high': lambda df: df.max(),
    'low': lambda df: df.min(),
    'close': lambda df: df[-1],
    'volume': lambda df: df.sum()
})