Getting the average of a certain hour on weekdays over several years in a pandas dataframe

Note: Now that Series have the dt accessor it's less important that date is the index, though Date/Time still needs to be a datetime64.

Update: You can do the groupby more directly (without the lambda):

In [21]: df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()
Out[21]:
                     Value
Date/Time Date/Time
2010      0             60
          1             50
          2             52
          3             49

In [22]: res = df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()

In [23]: res.index.names = ["year", "hour"]

In [24]: res
Out[24]:
           Value
year hour
2010 0        60
     1        50
     2        52
     3        49

If it's a datetime64 index you can do:

In [31]: df1.groupby([df1.index.year, df1.index.hour]).mean()
Out[31]:
        Value
2010 0     60
     1     50
     2     52
     3     49

Old answer (will be slower):

Assuming Date/Time was the index* you can use a mapping function in the groupby:

In [11]: year_hour_means = df1.groupby(lambda x: (x.year, x.hour)).mean()

In [12]: year_hour_means
Out[12]:
           Value
(2010, 0)     60
(2010, 1)     50
(2010, 2)     52
(2010, 3)     49

For a more useful index, you could then create a MultiIndex from the tuples:

In [13]: year_hour_means.index = pd.MultiIndex.from_tuples(year_hour_means.index,
                                                           names=['year', 'hour'])

In [14]: year_hour_means
Out[14]:
           Value
year hour
2010 0        60
     1        50
     2        52
     3        49

* if not, then first use set_index:

df1 = df.set_index('Date/Time')

If your date/time column were in the datetime format (see dateutil.parser for automatic parsing options), you can use pandas resample as below:

year_hour_means = df.resample('H',how = 'mean')

which will keep your data in the datetime format. This may help you with whatever you are going to be doing with your data down the line.

Getting the average of a certain hour on weekdays over several years in a pandas dataframe

Tags:

Python

Pandas

Datetime

Statistics

Average

Related

Recent Posts