Failure when filtering string list with re.match
selected_files = filter(regex.match, files)
re.match('regex')
equals to re.search('^regex')
or text.startswith('regex')
but regex version. It only checks if the string starts with the regex.
So, use re.search()
instead:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_x\d+_y\d+\.npy')
selected_files = list(filter(regex.search, files))
# The list call is only required in Python 3, since filter was changed to return a generator
print(selected_files)
Output:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
And if you just want to get all of the .npy
files, str.endswith()
would be a better choice:
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
selected_files = list(filter(lambda x: x.endswith('.npy'), files))
print(selected_files)
Just use search
- since match starts matching from the beginning to end (i.e. entire) of string and search matches anywhere in the string.
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_x\d+_y\d+\.npy')
selected_files = filter(regex.search, files)
print(selected_files)
Output-
['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']
If you match, the pattern must cover the entire input. Either extend you regular expression:
regex = re.compile(r'.*_x\d+_y\d+\.npy')
Which would match:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
Or use re.search, which
scans through string looking for the first location where the regular expression pattern produces a match [...]
re.match()
looks for a match at the beginning of the string. You can use re.search()
instead.