Remove all inline styles using BeautifulSoup

I wouldn't do this in BeautifulSoup - you'll spend a lot of time trying, testing, and working around edge cases.

Bleach does exactly this for you. http://pypi.python.org/pypi/bleach

If you were to do this in BeautifulSoup, I'd suggest you go with the "whitelist" approach, like Bleach does. Decide which tags may have which attributes, and strip every tag/attribute that doesn't match.


Here's my solution for Python3 and BeautifulSoup4:

def remove_attrs(soup, whitelist=tuple()):
    for tag in soup.findAll(True):
        for attr in [attr for attr in tag.attrs if attr not in whitelist]:
            del tag[attr]
    return soup

It supports a whitelist of attributes which should be kept. :) If no whitelist is supplied all the attributes get removed.


You don't need to parse any CSS if you just want to remove it all. BeautifulSoup provides a way to remove entire attributes like so:

for tag in soup():
    for attribute in ["class", "id", "name", "style"]:
        del tag[attribute]

Also, if you just want to delete entire tags (and their contents), you don't need extract(), which returns the tag. You just need decompose():

[tag.decompose() for tag in soup("script")]

Not a big difference, but just something else I found while looking at the docs. You can find more details about the API in the BeautifulSoup documentation, with many examples.