How to iterate over files in directory python
This tutorial will show you some ways to iterate files in a given directory and do some actions on them using Python.
1. Using os.listdir()
#
This method returns a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.
Example: print out all paths to files that have jpg
or png
extension in C:\Users\admin
directory
import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
if filename.endswith(".jpg") or filename.endswith(".png"):
print(os.path.join(directory, filename))
else:
continue
2. Using os.scandir()
#
Since Python 3.5, things are much easier with os.scandir()
. This example does the same thing as above but it uses os.scandir()
instead of os.listdir()
import os
directory = r'C:\Users\admin'
for entry in os.scandir(directory):
if (entry.path.endswith(".jpg")
or entry.path.endswith(".png")) and entry.is_file():
print(entry.path)
Both os.listdir()
and os.scandir
approaches only list the directories or files immediately under a directory. If you want recursive listing files and folders in a given directory, please consider using below methods.
3. Using os.walk()
#
This method will iterate over all descendant files in subdirectories. Consider the example above, but in this case, this method recursively prints all images in C:\Users\admin
directory.
import os
for subdir, dirs, files in os.walk(r'C:\Users\admin'):
for filename in files:
filepath = subdir + os.sep + filename
if filepath.endswith(".jpg") or filepath.endswith(".png"):
print (filepath)
4. Using glob
module
#
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.
Let consider an example where we will list all png
and pdf
files in C:\Users\admin
directory
import glob
# Print png images in folder C:\Users\admin\
for filepath in glob.iglob(r'C:\Users\admin\*.png'):
print(filepath)
# Print pdf files in folder C:\Users\admin\
for filepath in glob.iglob(r'C:\Users\admin\*.pdf'):
print(filepath)
By default, glob.iglob
only lists files immediately under the given directory. To recursively list all files in nested folders, set the recursive
param to True
import glob
# Recursively print png images in folder C:\Users\admin\
for filepath in glob.iglob(r'C:\Users\admin\*.png', recursive=True):
print(filepath)
# Recursively print pdf files in folder C:\Users\admin\
for filepath in glob.iglob(r'C:\Users\admin\*.pdf', recursive=True):
print(filepath)
You can either use glob.iglob
or glob.glob
. The difference is, glob.iglob
return an iterator which yields the paths matching a pathname pattern while glob.glob
returns a list.
5. Iterate recursively using Path
class from pathlib
module
#
The code below does the same as above example, which lists and prints the png image in a folder but it uses the pathlib.Path
from pathlib import Path
paths = Path('C:\Users\admin').glob('**/*.png')
for path in paths:
# because path is object not string
path_in_str = str(path)
# Do thing with the path
print(path_in_str)