I've found reading and writing a UTF-8 encoded XML file with Python ElementTree harder than expected so I thought I'd pull together this demo.
The trick is to open the source and destination files with an explicit encoding instead of relying on ElementTree.parse and ElementTree.write.
Reading is as simple as opening the source file with encoding="utf-8"
then
passing the file instance to ElementTree.parse
.
Writing involves writing the XML doc to UTF8 encoded bytes by calling
ElementTree.tostring
and specifying encoding="utf-8"
, then open the destination for writing in
UTF8 and write the decoded bytes.