Sort dataframe by string length
I found this solution more intuitive, specially if you want to do something depending on the column length later on.
df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)
Now your dataframe will have a column with name length
with the value of string length from column name
in it and the whole dataframe will be sorted in descending order.
You can use reindex
of index
of Series
created by len
with sort_values
:
print (df.name.str.len())
0 5
1 2
2 6
3 4
Name: name, dtype: int64
print (df.name.str.len().sort_values())
1 2
3 4
0 5
2 6
Name: name, dtype: int64
s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')
print (df.reindex(s))
name score
1 Al 4
3 Greg 3
0 Steve 2
2 Markus 2
df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
name score
0 Al 4
1 Greg 3
2 Steve 2
3 Markus 2
Using DataFrame.sort_values
with key
argument
since pandas >= 1.1.0
:
We can now pas the length of the string or any other custom key in the sort_values
method:
df = pd.DataFrame({
'name': ['Steve', 'Al', 'Markus', 'Greg'],
'score': [2, 4, 2, 3]
})
print(df)
name score
0 Steve 2
1 Al 4
2 Markus 2
3 Greg 3
df.sort_values(by="name", key=lambda x: x.str.len())
name score
1 Al 4
3 Greg 3
0 Steve 2
2 Markus 2
The answer of @jezrael is great and explains well. Here is the final result :
index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)