pathlib.Path().glob() and multiple file extension
You can also use the syntax **
from pathlib
which allows you to recursively collect the nested paths.
from pathlib import Path
import re
BASE_DIR = Path('.')
EXTENSIONS = {'.xls', '.txt'}
for path in BASE_DIR.glob(r'**/*'):
if path.suffix in EXTENSIONS:
print(path)
If you want to express more logic in your search you can also use a regex as follows:
pattern_sample = re.compile(r'/(([^/]+/)+)(S(\d+)_\d+).(tif|JPG)')
This pattern will look for all images (tif and JPG) that match S327_008(_flipped)?.tif
in my case. Specifically it will collect the sample id and the file name.
Collecting into a set prevents storing duplicates, I found it sometimes useful if you insert more logic and want to ignore different versions of the files (_flipped
)
matched_images = set()
for item in BASE_DIR.glob(r'**/*'):
match = re.match(pattern=pattern_sample, string=str(item))
if match:
# retrieve the groups of interest
filename, sample_id = match.group(3, 4)
matched_images.add((filename, int(sample_id)))
If you need to use pathlib.Path.glob()
from pathlib import Path
def get_files(extensions):
all_files = []
for ext in extensions:
all_files.extend(Path('.').glob(ext))
return all_files
files = get_files(('*.txt', '*.py', '*.cfg'))
A bit late to the party with a couple of single-line suggestions that don't require writing a custom function nor the use of a loop and work on Linux:
pathlib.Path.glob() takes interleaved symbols in brackets. For the case of ".txt" and ".xls" suffixes, one could write
files = pathlib.Path('temp_dir').glob('*.[tx][xl][ts]')
If you need to search for ".xlsx" as well, just append the wildcard "*" after the last closing bracket.
files = pathlib.Path('temp_dir').glob('*.[tx][xl][ts]*')
A thing to keep in mind is that the wildcard at the end will be catching not only the "x", but any trailing characters after the last "t" or "s".
Prepending the search pattern with "**/" will do the recursive search as discussed in previous answers.