How to extract an IP address from an HTML string?
import re
ipPattern = re.compile('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
findIP = re.findall(ipPattern,s)
findIP contains ['165.91.15.131']
Remove your capturing group:
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )
Result:
['165.91.15.131']
Notes:
- If you are parsing HTML it might be a good idea to look at BeautifulSoup.
- Your regular expression matches some invalid IP addresses such as
0.00.999.9999
. This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. You could change the+
to{1,3}
for a partial fix without making the regular expression overly complex.
You can use the following regex to capture only valid IP addresses
re.findall(r'\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\b',s)
returns
['165', '91', '15', '131']