Parsing CSV / tab-delimited txt file with Python

If the file is large, you may not want to load it entirely into memory at once. This approach avoids that. (Of course, making a dict out of it could still take up some RAM, but it's guaranteed to be smaller than the original file.)

my_dict = {}
for i, line in enumerate(file):
    if (i - 8) % 7:
        continue
    k, v = line.split("\t")[:3:2]
    my_dict[k] = v

Edit: Not sure where I got extend from before. I meant update

Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas.

Pandas is a library for handling data in Python, preferred by many Data Scientists.

Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. The keys will be the column names, and the values will be the ones in each cell.

In your case:

    import pandas

    def create_dictionary(filename):
        my_data = pandas.DataFrame.from_csv(filename, sep='\t', index_col=False)
        # Here you can delete the dataframe columns you don't want!
        del my_data['B']
        del my_data['D']
        # ...
        # Now you transform the DataFrame to a list of dictionaries
        list_of_dicts = [item for item in my_data.T.to_dict().values()]
        return list_of_dicts

# Usage:
x = create_dictionary("myfile.csv")

Start by turning the text into a list of lists. That will take care of the parsing part:

lol = list(csv.reader(open('text.txt', 'rb'), delimiter='\t'))

The rest can be done with indexed lookups:

d = dict()
key = lol[6][0]      # cell A7
value = lol[6][3]    # cell D7
d[key] = value       # add the entry to the dictionary
 ...

Parsing CSV / tab-delimited txt file with Python

Tags:

Python

Csv

Dictionary

Parsing

Related

Recent Posts