Python Pandas: Is Order Preserved When Using groupby() and agg()?
See this enhancement issue
The short answer is yes, the groupby will preserve the orderings as passed in. You can prove this by using your example like this:
In [20]: df.sort_index(ascending=False).groupby('A').agg([np.mean, lambda x: x.iloc[1] ])
Out[20]:
B C
mean <lambda> mean <lambda>
A
group1 11.0 10 101 100
group2 17.5 10 175 100
group3 11.0 10 101 100
This is NOT true for resample however as it requires a monotonic index (it WILL work with a non-monotonic index, but will sort it first).
Their is a sort=
flag to groupby, but this relates to the sorting of the groups themselves and not the observations within a group.
FYI: df.groupby('A').nth(1)
is a safe way to get the 2nd value of a group (as your method above will fail if a group has < 2 elements)
In order to preserve order, you'll need to pass .groupby(..., sort=False)
. In your case the grouping column is already sorted, so it does not make difference, but generally one must use the sort=False
flag:
df.groupby('A', sort=False).agg([np.mean, lambda x: x.iloc[1] ])
Panda's 0.19.1 doc says "groupby preserves the order of rows within each group", so this is guaranteed behavior.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html
Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
The API accepts "SORT" as an argument.
Description for SORT argument is like this:
sort : bool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
Thus, it is clear the "Groupby" does preserve the order of rows within each group.