Flatten a Series in pandas, i.e. a series whose elements are lists
s.map(len).sum()
does the trick. s.map(len)
applies len()
to each element and returns a series of all the lengths, then you can just use sum
on that series.
If we stick with the pandas Series as in the original question, one neat option from the Pandas version 0.25.0 onwards is the Series.explode() routine. It returns an exploded list to rows, where the index will be duplicated for these rows.
The original Series from the question:
s = pd.Series([['a','a','b'],['b','b','c','d'],[],['a','b','e']])
Let's explode it and we get a Series, where the index is repeated. The index indicates the index of the original list.
>>> s.explode()
Out:
0 a
0 a
0 b
1 b
1 b
1 c
1 d
2 NaN
3 a
3 b
3 e
dtype: object
>>> type(s.explode())
Out:
pandas.core.series.Series
To count the number of elements we can now use the Series.value_counts():
>>> s.explode().value_counts()
Out:
b 4
a 3
d 1
c 1
e 1
dtype: int64
To include also NaN values:
>>> s.explode().value_counts(dropna=False)
Out:
b 4
a 3
d 1
c 1
e 1
NaN 1
dtype: int64
Finally, plotting the histogram using Series.plot():
>>> s.explode().value_counts(dropna=False).plot(kind = 'bar')