Pandas Series of lists to one series
Here's a simple method using only pandas functions:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Then
s.apply(pd.Series).stack().reset_index(drop=True)
gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.
0 0 slim
1 waist
2 man
1 0 slim
1 waistline
2 0 santa
If this is what you want, just omit .reset_index(drop=True)
from the chain.
You are basically just trying to flatten a nested list here.
You should just be able to iterate over the elements of the series:
slist =[]
for x in series:
slist.extend(x)
or a slicker (but harder to understand) list comprehension:
slist = [st for row in s for st in row]
In pandas version 0.25.0
appeared a new method 'explode' for series and dataframes. Older versions do not have such method.
It helps to build the result you need.
For example you have such series:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Then you can use
s.explode()
To get such result:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
In case of dataframe:
df = pd.DataFrame({
's': pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']
]),
'a': 1
})
You will have such DataFrame:
s a
0 [slim, waist, man] 1
1 [slim, waistline] 1
2 [santa] 1
Applying explode on s
column:
df.explode('s')
Will give you such result:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
If your series, contain empty lists
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa'],
[]
])
Then running explode
will introduce NaN values for empty lists, like this:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
3 NaN
If this is not desired, you can dropna method call:
s.explode().dropna()
To get this result:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
Dataframes also have dropna method:
df = pd.DataFrame({
's': pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa'],
[]
]),
'a': 1
})
Running explode
without dropna:
df.explode('s')
Will result into:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
3 NaN 1
with dropna:
df.explode('s').dropna(subset=['s'])
Result:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1