pandas: create a long/tidy DataFrame from dictionary when values are sets or lists of variable length
Use numpy.repeat
with chain.from_iterable
:
from itertools import chain
df = pd.DataFrame({
'letter' : np.repeat(list(d.keys()), [len(v) for k, v in d.items()]),
'value' : list(chain.from_iterable(d.values())),
})
print (df)
letter value
0 a 1
1 a 2
2 a 3
3 b 3
4 b 4
You can use a comprehension with itertools.chain
and zip
:
from itertools import chain
keys, values = map(chain.from_iterable, zip(*((k*len(v), v) for k, v in d.items())))
df = pd.DataFrame({'letter': list(keys), 'value': list(values)})
print(df)
letter value
0 a 1
1 a 2
2 a 3
3 b 3
4 b 4
This can be rewritten in a more readable fashion:
zipper = zip(*((k*len(v), v) for k, v in d.items()))
values = map(list, map(chain.from_iterable, zipper))
df = pd.DataFrame(list(values), columns=['letter', 'value'])