Count most frequent 100 words from sentences in Dataframe Pandas

from collections import Counter
Counter(" ".join(df["text"]).split()).most_common(100)

I'm pretty sure this would give you what you want. (You might have to remove some non-words from the counter result before calling most_common.)

Along with @Joran's solution you could also you use series.value_counts for large amounts of text/rows

 pd.Series(' '.join(df['text']).lower().split()).value_counts()[:100]

You would find from the benchmarks series.value_counts seems twice (2X) faster than Counter method

For Movie Reviews dataset of 3000 rows, totaling 400K characters and 70k words.

In [448]: %timeit Counter(" ".join(df.text).lower().split()).most_common(100)
10 loops, best of 3: 44.2 ms per loop

In [449]: %timeit pd.Series(' '.join(df.text).lower().split()).value_counts()[:100]
10 loops, best of 3: 27.1 ms per loop

Count most frequent 100 words from sentences in Dataframe Pandas

Tags:

Python

Pandas

Related

Recent Posts