Jinja2 escape all HTML but img, b, etc
You can write your own filter. The scrubber library is pretty good at cleaning up HTML. The filter will need to wrap the returned string in jinja2.Markup
so the template will not re-escape it.
Edit: a code example
import jinja2
import scrubber
def sanitize_html(text):
return jinja2.Markup(scrubber.Scrubber().scrub(text))
jinja_env.filters['sanitize_html'] = sanitize_html
The Bleach library can do very well.
For example, assuming the variable 'jinja_env' is in scope:
from bleach import clean
from markupsafe import Markup
def do_clean(text, **kw):
"""Perform clean and return a Markup object to mark the string as safe.
This prevents Jinja from re-escaping the result."""
return Markup(clean(text, **kw))
jinja_env.filters['clean'] = do_clean
Then in a template you might have something like:
<p>{{ my_variable|clean(tags=['img', 'b', 'i', 'em', 'strong'], attributes={'img': ['src', 'alt', 'title', 'width', 'height']}) }}</p>
You can also use a callable (instead of a list) in the attributes, allowing more thorough validation of the attributes (e.g. checking that src provides a valid URL). Documentation shows an example.
You'll want to parse the input on submission using a white list approach - there are several good examples in this question and viable options out there.
Once you have done that, you can mark any variables that will contain HTML that should not be escaped with the safe
filter:
{{comment|safe}}