pandas sort with capital letters

Using DataFrame.sort_values with key argument since pandas >= 1.1.0:

We can now pass a custom function of the string or any other custom key in the sort_values method:

df = pd.DataFrame(['ADc','Abc','AEc'],columns = ['Test'],index=[0,1,2])
print(df)

  Test
0  ADc
1  Abc
2  AEc
df.sort_values(by="Test", key=lambda x: x.str.lower())

  Test
1  Abc
0  ADc
2  AEc

I don't think that's a pandas bug. It seems to be just the way python sorting algorithm works with mixed cased letters (being case sensitive) - look here

Because when you do:

In [1]: l1 = ['ADc','Abc','AEc']
In [2]: l1.sort(reverse=True)
In [3]: l1
Out[3]: ['Abc', 'AEc', 'ADc']

So, since apparently one cannot control the sorting algorithm using the pandas sort method, just use a lower cased version of that column for the sorting and drop it later on:

In [4]: df = pd.DataFrame(['ADc','Abc','AEc'], columns=['Test'], index=[0,1,2])
In [5]: df['test'] = df['Test'].str.lower()
In [6]: df.sort(columns=['test'], axis=0, ascending=True, inplace=True)
In [7]: df.drop('test', axis=1, inplace=True)
In [8]: df
Out[8]:
  Test
1  Abc
0  ADc
2  AEc

Note: If you want the column sorted alphabetically, the ascending argument must be set to True

EDIT:

As DSM suggested, to avoid creating a new helper column, you can do:

df = df.loc[df["Test"].str.lower().order().index]

UPDATE:

As pointed out by weatherfrog, for newer versions of pandas the correct method is .sort_values(). So the above one-liner becomes:

df = df.loc[df["Test"].str.lower().sort_values().index]

Tags:

Pandas

Sorting