Using Python's Pandas to find average values by bins
The most concise way is probably to convert this to a timeseris
data and them downsample to get the means:
In [75]:
print df
ID Level
1
1980-04-17 485438103132901 -7.10
1980-05-06 485438103132901 -6.80
1979-09-10 483622101085001 -6.70
1979-07-31 485438103132901 -6.20
1980-11-11 483845101112801 -5.37
1980-11-11 484123101124601 -5.30
1977-07-06 485438103132901 -4.98
In [76]:
df.Level.resample('60M', how='mean')
#also may consider different time alias: '5A', '5BA', '5AS', etc:
#see: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
Out[76]:
1
1977-07-31 -4.980
1982-07-31 -6.245
Freq: 60M, Name: Level, dtype: float64
Alternatively, you may use groupby
together with cut
:
In [99]:
print df.groupby(pd.cut(df.index.year, pd.date_range('1960', periods=5, freq='5A').year, include_lowest=True)).mean()
ID Level
[1960, 1965] NaN NaN
(1965, 1970] NaN NaN
(1970, 1975] NaN NaN
(1975, 1980] 4.847632e+14 -6.064286
And by ID also:
In [100]:
print df.groupby(['ID',
pd.cut(df.index.year, pd.date_range('1960', periods=5, freq='5A').year, include_lowest=True)]).mean()
Level
ID
483622101085001 (1975, 1980] -6.70
483845101112801 (1975, 1980] -5.37
484123101124601 (1975, 1980] -5.30
485438103132901 (1975, 1980] -6.27
so what i like to do is create a separate column with the rounded bin number:
bin_width = 50000
mult = 1. / bin_width
df['bin'] = np.floor(ser * mult + .5) / mult
then, just group by the bins themselves
df.groupby('bin').mean()
another note, you can do multiple truth evaluations in one go:
df[(df.date > a) & (df.date < b)]