Is there a difference between : "file.readlines()", "list(file)" and "file.read().splitlines(True)"?

Explicit is better than implicit, so I prefer:

with open("file.txt", "r") as f:
    data = f.readlines()

But, when it is possible, the most pythonic is to use the file iterator directly, without loading all the content to memory, e.g.:

with open("file.txt", "r") as f:
    for line in f:
       my_function(line)

TL;DR;

Considering you need a list to manipulate them afterwards, your three proposed solutions are all syntactically valid. There is no better (or more pythonic) solution, especially since they all are recommended by the official Python documentation. So, choose the one you find the most readable and be consistent with it throughout your code. If performance is a deciding factor, see my timeit analysis below.

Here is the timeit (10000 loops, ~20 line in test.txt),

import timeit

def foo():
    with open("test.txt", "r") as f:
        data = list(f)

def foo1():
    with open("test.txt", "r") as f:
        data = f.read().splitlines(True)

def foo2():
    with open("test.txt", "r") as f:
        data = f.readlines()

print(timeit.timeit(stmt=foo, number=10000))
print(timeit.timeit(stmt=foo1, number=10000))
print(timeit.timeit(stmt=foo2, number=10000))

>>>> 1.6370758459997887
>>>> 1.410844805999659
>>>> 1.8176437409965729

I tried it with multiple number of loops and lines, and f.read().splitlines(True) always seems to be performing a bit better than the two others.

Now, syntactically speaking, all of your examples seems to be valid. Refer to this documentation for more informations.

According to it, if your goal is to read lines form a file,

for line in f:
    ...

where they states that it is memory efficient, fast, and leads to simple code. Which would be another good alternative in your case if you don't need to manipulate them in a list.

EDIT

Note that you don't need to pass your True boolean to splitlines. It has your wanted behavior by default.

My personal recommendation

I don't want to make this answer too opinion-based, but I think it would be beneficial for you to know, that I don't think performance should be your deciding factor until it is actually a problem for you. Especially since all syntax are allowed and recommended in the official Python doc I linked.

So, my advice is,:

First, pick the most logical one for your particular case and then choose the one you find the most readable and be consistent with it throughout your code.

They're all achieving the same goal of returning a list of strings but using separate approaches. f.readlines() is the most Pythonic.

with open("file.txt", "r") as f:
    data = list(f)

f here is a file-like object, which is being iterated over through list, which returns lines in the file.

with open("file.txt", "r") as f:
    data = f.read().splitlines(True)

f.read() returns a string, which you split on newlines, returning a list of strings.

with open("file.txt", "r") as f:
    data = f.readlines()

f.readlines() does the same as above, it reads the entire file and splits on newlines.

Is there a difference between : "file.readlines()", "list(file)" and "file.read().splitlines(True)"?

Tags:

Python

List

File

Readlines

Related

Recent Posts