numpy sort acting weirdly when sorting on a pandas DataFrame
data[genres].sum()
returns a Series. The genre column isn't actually a column - it's an index.
np.sort
just looks at the values of the DataFrame or Series, not at the index, and it returns a new NumPy array with the sorted data[genres].sum()
values. The index information is lost.
The way to sort data[genres].sum()
and keep the index information would be to do something like:
genre_count = data[genres].sum()
genre_count.sort(ascending=False) # in-place sort of genre_count, high to low
You can then turn the sorted genre_count
Series back into a DataFrame if you like:
pd.DataFrame({'Genre Count': genre_count})