Fill in same amount of characters where other column is NaN
One way is using str.repeat
and fillna()
not sure how efficient this is though:
df.Col2.fillna(pd.Series(['~']*len(df)).str.repeat(df.Col1.str.count(',')))
0 aa~bb~cc~dd
1 ~~~
2 ii~jj~kk~ll~mm
Name: Col2, dtype: object
pd.concat
delims = {'Col1': ',', 'Col2': '~'}
pd.concat({
k: df[k].str.split(delims[k], expand=True)
for k in df}, axis=1
).stack()
Col1 Col2
0 0 a aa
1 b bb
2 c cc
3 d dd
1 0 e NaN
1 f NaN
2 g NaN
3 h NaN
2 0 i ii
1 j jj
2 k kk
3 l ll
4 m mm
This loops on columns in df
. It may be wiser to loop on keys in the delims
dictionary.
delims = {'Col1': ',', 'Col2': '~'}
pd.concat({
k: df[k].str.split(delims[k], expand=True)
for k in delims}, axis=1
).stack()
Same thing, different look
delims = {'Col1': ',', 'Col2': '~'}
def f(c): return df[c].str.split(delims[c], expand=True)
pd.concat(map(f, delims), keys=delims, axis=1).stack()
zip_longest
can be useful here, given you don't need the original Index. It will work regardless of which column has more splits:
from itertools import zip_longest, chain
df = pd.DataFrame({'Col1':['a,b,c,d', 'e,f,g,h', 'i,j,k,l,m', 'x,y'],
'Col2':['aa~bb~cc~dd', np.NaN, 'ii~jj~kk~ll~mm', 'xx~yy~zz']})
# Col1 Col2
#0 a,b,c,d aa~bb~cc~dd
#1 e,f,g,h NaN
#2 i,j,k,l,m ii~jj~kk~ll~mm
#3 x,y xx~yy~zz
l = [zip_longest(*x, fillvalue='')
for x in zip(df.Col1.str.split(',').fillna(''),
df.Col2.str.split('~').fillna(''))]
pd.DataFrame(chain.from_iterable(l))
0 1
0 a aa
1 b bb
2 c cc
3 d dd
4 e
5 f
6 g
7 h
8 i ii
9 j jj
10 k kk
11 l ll
12 m mm
13 x xx
14 y yy
15 zz