Python Feedparser: How can I check for new RSS data?
Regarding downloading only if/when the feed changed, you can use the HTTP header's ETag
and as fallback also Last-Modified
.
>>> feed.etag
'"6c132-941-ad7e3080"'
>>> feed.modified
'Fri, 11 Jun 2012 23:00:34 GMT'
You can specify them in your call to feedparser.parse
. If they are still the same (no changes), the request will have the status code 304 (not modified).
It boils down to this example:
import feedparser
url = 'http://feedparser.org/docs/examples/atom10.xml'
# first request
feed = feedparser.parse(url)
# store the etag and modified
last_etag = feed.etag
last_modified = feed.modified
# check if new version exists
feed_update = feedparser.parse(url, etag=last_etag, modified=last_modified)
if feed_update.status == 304:
# no changes
Notes:
You need to check if feed.etag
and feed.modified
exists.
The feedparser
library will automatically send the If-None-Match
header with the provided etag
parameter and If-Modified-Since
with the modified
value for you.
Source: Feedparser documentation about http and etag
To clarify the question asked in the comments:
This needs that the server supports either of those headers.
If neither header works, you can't use this, and have to always download the feed from the server, even if it's unchanged, as you simply can't tell before you downloaded it.
That means you have to download the feed every time, and store what entries you already seen.
If you want to not display stuff you already seen before (e.g. printing only the new ones) you have to keep a list of seen feeds anyway. Some feeds have an id
field for each entry
, which you can use in that case. Otherwise you have to be a bit creative to figure out what makes an entry the same, for your feed specifically.