'DataFrame' object has no attribute 'sort'

Pandas Sorting 101

sort has been replaced in v0.20 by DataFrame.sort_values and DataFrame.sort_index. Aside from this, we also have argsort.

Here are some common use cases in sorting, and how to solve them using the sorting functions in the current API. First, the setup.

# Setup
np.random.seed(0)
df = pd.DataFrame({'A': list('accab'), 'B': np.random.choice(10, 5)})    
df                                                                                                                                        
   A  B
0  a  7
1  c  9
2  c  3
3  a  5
4  b  2

Sort by Single Column

For example, to sort df by column "A", use sort_values with a single column name:

df.sort_values(by='A')

   A  B
0  a  7
3  a  5
4  b  2
1  c  9
2  c  3

If you need a fresh RangeIndex, use DataFrame.reset_index.

Sort by Multiple Columns

For example, to sort by both col "A" and "B" in df, you can pass a list to sort_values:

df.sort_values(by=['A', 'B'])

   A  B
3  a  5
0  a  7
4  b  2
2  c  3
1  c  9

Sort By DataFrame Index

df2 = df.sample(frac=1)
df2

   A  B
1  c  9
0  a  7
2  c  3
3  a  5
4  b  2

You can do this using sort_index:

df2.sort_index()

   A  B
0  a  7
1  c  9
2  c  3
3  a  5
4  b  2

df.equals(df2)                                                                                                                            
# False
df.equals(df2.sort_index())                                                                                                               
# True

Here are some comparable methods with their performance:

%timeit df2.sort_index()                                                                                                                  
%timeit df2.iloc[df2.index.argsort()]                                                                                                     
%timeit df2.reindex(np.sort(df2.index))                                                                                                   

605 µs ± 13.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
610 µs ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
581 µs ± 7.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Sort by List of Indices

For example,

idx = df2.index.argsort()
idx
# array([0, 7, 2, 3, 9, 4, 5, 6, 8, 1])

This "sorting" problem is actually a simple indexing problem. Just passing integer labels to iloc will do.

df.iloc[idx]

   A  B
1  c  9
0  a  7
2  c  3
3  a  5
4  b  2

sort() was deprecated for DataFrames in favor of either:

sort_values() to sort by column(s)
sort_index() to sort by the index

sort() was deprecated (but still available) in Pandas with release 0.17 (2015-10-09) with the introduction of sort_values() and sort_index(). It was removed from Pandas with release 0.20 (2017-05-05).

'DataFrame' object has no attribute 'sort'

Pandas Sorting 101

Sort by Single Column

Sort by Multiple Columns

Sort By DataFrame Index

Sort by List of Indices

Tags:

Python

Pandas

Numpy

Dataframe

Related

Recent Posts