XML Unicode strings with encoding declaration are not supported
The following solution from kernc worked for me:
from lxml import etree
xml = u'<?xml version="1.0" encoding="utf-8" ?><foo><bar/></foo>'
xml = bytes(bytearray(xml, encoding='utf-8')) # ADDENDUM OF THIS LINE (when unicode means utf-8, e.g. on Linux)
etree.XML(xml)
# <Element html at 0x5b44c90>
More simple than answers above:
from lxml import etree
#Do request for data, response = r#
data = etree.fromstring(bytes(r.text, encoding='utf-8'))
Apologies all as at this time I was young, dumb and not quite mature enough to take the effort to explain my answer (nor did I have the knowledge really) ð
fromstring()
is a custom constructor for theetree
object as part of thelxml
library- XML in the form of a string may contain characters which are not encoded how this constructor would like them to be -> thus you can encode them into
utf-8
bytes and this will align with thefromstring()
constructor requirements
You'll have to encode it and then force the same encoding in the parser:
from lxml import etree
from lxml.etree import fromstring
if request.POST:
xml = request.POST['xml'].encode('utf-8')
parser = etree.XMLParser(ns_clean=True, recover=True, encoding='utf-8')
h = fromstring(xml, parser=parser)
return HttpResponse(h.cssselect('delivery_reciept status').text_content())