@sukbir is probably not using Windows. What happens is that lxml writes a newline (0A 00 in UTF-16LE) between the XML header and the body. This is then molested by Win text mode to become 0D 0A 00 which makes everything after that look like UTF-16BE hence the Chinese etc characters when you display it. You can get around this in this instance by using "wb" instead of "w" when you open the file. However I'd strongly suggest that you use 'UTF-8' (spelled EXACTLY like that) as your encoding. Why are you using UTF-16? You like large files and/or weird problems?
Commented Dec 17, 2010 at 21:20doc.write('output.xml', xml_declaration=True, encoding='utf-16')
outFile = open('output.xml', 'w') doc.write(outFile, xml_declaration=True, encoding='utf-16')
16.5k 11 11 gold badges 64 64 silver badges 81 81 bronze badges
answered Sep 11, 2013 at 12:28
51 1 1 silver badge 1 1 bronze badge
Will this respect XML indentation? I am creating the XML file in a similar fashion. But having issues in formatting whenever I add a element. If I modifytag or modify text and write back to a new xml file it works fine. Don't know with additions it's not working. Here is the format:
Promoting my comment to an answer:
@sukbir is probably not using Windows. What happens is that lxml writes a newline (0A 00 in UTF-16LE) between the XML header and the body. This is then molested by Win text mode to become 0D 0A 00 which makes everything after that look like UTF-16BE hence the Chinese etc characters when you display it. You can get around this in this instance by using "wb" instead of "w" when you open the file. However I'd strongly suggest that you use 'UTF-8' (spelled EXACTLY like that) as your encoding. Why are you using UTF-16? You like large files and/or weird problems?
answered Dec 18, 2010 at 11:22 John Machin John Machin 82.7k 11 11 gold badges 145 145 silver badges 191 191 bronze badgesUnfortunately the "wb" didn't solve this issue for me, but the newlines were the cause, so was able to work around the issue by writing the xml on one line (no pretty_print) and manually adding the declaration. On the question of "Why are you using UTF-16? You like large files and/or weird problems?" it could be (as in my case) that a 3rd party required a file in UTF-16. If you deal with other interfaces from other parties then you don't always have control over what you send them.
Commented Apr 15, 2014 at 16:07