How to get all words with specific length that doesn't contain number?

You may use

import re
s = "Question1: a12 is the number of a, b1 is the number of cầu thủ"
print(re.compile(r'\b[^\W\d_]{2,}\b').findall(s))
# => ['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'cầu', 'thủ']

Or, if you only want to limit to ASCII only letter words with minimum 2 letters:

print(re.compile(r'\b[a-zA-Z]{2,}\b').findall(s))

See the Python demo

Details

  • To match only letters, you need to use [^\W\d_] (or r'[a-zA-Z] ASCII-only variation)
  • To match whole words, you need word boundaries, \b
  • To make sure you are defining word boundaries and not backspace chars in the regex pattern, use a raw string literal, r'...'.

So, r'\b[^\W\d_]{2,}\b' defines a regex that matches a word boundary, two or more letters and then asserts that there is no word char right after these two letters.


Use str.isalpha:

s = "Question1: a12 is the number of a, b1 is the number of cầu thủ"
[c for c in re.findall('\w{2,}', s) if c.isalpha()]

Output:

['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'cầu', 'thủ']

Tags:

Python

Regex