best way to get files list of big directory on python?

If you have a directory that is too big for libc readdir() to read it quickly, you probably want to look at the kernel call getdents() (http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html ). I ran into a similar problem and wrote a long blog post about it.

http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/

Basically, readdir() only reads 32K of directory entries at a time, and so if you have a lot of files in a directory, readdir() will take a very long time to complete.


for python 2.X

import scandir
scandir.walk()

for python 3.5+

os.scandir()

https://www.python.org/dev/peps/pep-0471/

https://pypi.python.org/pypi/scandir


I found this library useful: https://github.com/benhoyt/scandir.