Read file of repeated "key=value" pairs into DataFrame

You can use pandas to read the file and process the data. You may use this:

import pandas as pd
df = pd.read_table(r'file.txt', header=None)
new = df[0].str.split("=", n=1, expand=True)
new['index'] = new.groupby(new[0])[0].cumcount()
new = new.pivot(index='index', columns=0, values=1)

new Outputs:

0     class grade name
index                 
0         B     A    1
1         A     D    2

I know you have enough answers, but here is another way of doing it using dictionary:

import pandas as pd
from collections import defaultdict
d = defaultdict(list)

with open("text_file.txt") as f:
    for line in f:
        (key, val) = line.split('=')
        d[key].append(val.replace('\n', ''))

df = pd.DataFrame(d)
print(df)

This gives you the output as:

name grade class
0    1     A     B
1    2     D     A

Just to get another perspective.

This solution assumes the text format is as you have described, but you could modify it to use a different word to denote the beginning of a new line. Here, we assume that a new line starts with the name field. I've modified your myfile() function below, hope it gives you some ideas :)

def myfile(filename):
    d_list = []
    with open(filename) as f:
        d_line = {}
        for line in f:
            split_line = line.rstrip("\n").split('=')  # Strip \n characters and split field and value.
            if (split_line[0] == 'name'):
                if d_line:
                    d_list.append(d_line)  # Append if there is previous line in d_line.
                d_line = {split_line[0]: split_line[1]}  # Start a new dictionary to collect the next lines.
            else:
                d_line[split_line[0]] = split_line[1]  # Add the other 2 fields to the dictionary.
        d_list.append(d_line) # Append the last line.
    return pd.DataFrame(d_list)  # Turn the list of dictionaries into a DataFrame.

Read file of repeated "key=value" pairs into DataFrame

Tags:

Python

Pandas

Dataframe

Related

Recent Posts