Skip to content

Instantly share code, notes, and snippets.

@walkermatt
Created February 17, 2025 12:22
Show Gist options
  • Save walkermatt/440b2c1ad444e9fa7fd0aebdc6d44cad to your computer and use it in GitHub Desktop.
Save walkermatt/440b2c1ad444e9fa7fd0aebdc6d44cad to your computer and use it in GitHub Desktop.
Reading and writing a UTF-8 encoded XML file with Python ElementTree

Reading and writing a UTF-8 encoded XML file with Python ElementTree

I've found reading and writing a UTF-8 encoded XML file with Python ElementTree harder than expected so I thought I'd pull together this demo.

The trick is to open the source and destination files with an explicit encoding instead of relying on ElementTree.parse and ElementTree.write.

Reading is as simple as opening the source file with encoding="utf-8" then passing the file instance to ElementTree.parse.

Writing involves writing the XML doc to UTF8 encoded bytes by calling ElementTree.tostring and specifying encoding="utf-8", then open the destination for writing in UTF8 and write the decoded bytes.

from xml.etree import ElementTree as ET
import re
# Open the UTF-8 encoded file ourselves specifying the encoding before
# passing the file to ET.parse
with open("Workflow.xml", "r", encoding="utf-8") as f:
workflow_xml_tree = ET.parse(f)
workflow_xml_root = workflow_xml_tree.getroot()
# Update contents in some way...
for elm in workflow_xml_root.findall(f".//Job/Name"):
# Update the Job/Name prefix
print(f"Before: {elm.text}")
elm.text = re.sub(r"^Live - ", lambda m: f"Test - ", elm.text, flags=re.IGNORECASE)
print(f"After : {elm.text}")
# To ensure we write a UTF-8 encoded file with a matching XML declaration, first
# convert to UTF-8 encoded bytes then decode into a unicode str when writing to
# a file opened for writing in UTF-8
xmlstr = ET.tostring(
workflow_xml_tree.getroot(), encoding="utf-8", xml_declaration=True
)
with open("Workflow_TEST.xml", "w", encoding="utf-8") as f:
f.write(xmlstr.decode("utf-8"))
<?xml version='1.0' encoding='utf-8'?>
<Workflow>
<Connections>
<Connection>
<Name>NHS Choices</Name>
<Type>WFS</Type>
<Server>http://syn.ads.astuntechnology.com/nhs/choices/wfs</Server>
</Connection>
</Connections>
<Tasks>
<Task>
<Name>Download NHS Choices Data 😼</Name>
<Type>SDTTask</Type>
<SourceConnection>VRT File</SourceConnection>
<SourceValue>E:\iShareData\Utilities\nhs_choices_bbox_live.vrt</SourceValue>
<DestConnection>SDW</DestConnection>
<DestValue>example.nhschoices_all</DestValue>
<ProjectionBNG>Yes</ProjectionBNG>
<LegacyGeometryColumnName>Yes</LegacyGeometryColumnName>
<SkipFailures>No</SkipFailures>
<PreserveCase>Yes</PreserveCase>
<ForceGeometry>Yes</ForceGeometry>
<GeometryName>POINT</GeometryName>
<AdditionalParams>
</AdditionalParams>
<InputEncoding>UTF-8</InputEncoding>
</Task>
</Tasks>
<Jobs>
<Job>
<Name>Live - Schedule NHS Choices 😼</Name>
<Task>
<TaskName>Download NHS Choices Data</TaskName>
<Dependant>No</Dependant>
</Task>
<Task>
<TaskName>Update NHS Choices links</TaskName>
<Dependant>Yes</Dependant>
</Task>
<Layers />
</Job>
</Jobs>
</Workflow>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment