feedparser with timeout
Use Python requests
library for network IO, feedparser for parsing only:
# Do request using requests library and timeout
try:
resp = requests.get(rss_feed, timeout=20.0)
except requests.ReadTimeout:
logger.warn("Timeout when reading RSS %s", rss_feed)
return
# Put it to memory stream object universal feedparser
content = BytesIO(resp.content)
# Parse content
feed = feedparser.parse(content)
You can specify timeout globally using socket.setdefaulttimeout()
.
The timeout may limit how long an individual socket operation may last -- feedparser.parse()
may perform many socket operations and therefore the total time spent on dns, establishing the tcp connection, sending/receiving data may be much longer. See Read timeout using either urllib2 or any other http library.