blog.humaneguitarist.org

pretty printing XML with Python, lxml, and XSLT

[Sat, 12 Nov 2011 16:05:04 +0000]
Last week or so I was doing some work with Python and lxml [http://lxml.de/]. And, it seems like a lot of people, using lxml's pretty printing [http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output] wasn't really doing anything for me. I couldn't find any native lxml solutions to make my XML look pretty. All I found were some functions on various code sites written by people to pretty print the XML using a bunch of regular expressions. Yuck. So I thought, "Why not use XSLT to pretty print my XML?" and I found an XSL written by none other than Michael Kay [http://en.wikipedia.org/wiki/Michael_Kay_%28software_engineer%29] on this [http://www.dpawson.co.uk/xsl/sect2/pretty.html#d8621e19] page (see comment #4). And it seems to work just fine as a function to return pretty XML, not to mention it's super short and sweet. Anyway, here's an example of using the XSL for pretty printing. from lxml import etree def prettify(someXML): #for more on lxml/XSLT see: http://lxml.de/xpathxslt.html#xslt-result-objects xslt_tree = etree.XML('''\ <!-- XSLT taken from Comment 4 by Michael Kay found here: http://www.dpawson.co.uk/xsl/sect2/pretty.html#d8621e19 --> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" encoding="UTF-8"/> <xsl:strip-space elements="*"/> <xsl:template match="/"> <xsl:copy-of select="."/> </xsl:template> </xsl:stylesheet>''') transform = etree.XSLT(xslt_tree) result = transform(someXML) return unicode(result) myXML = etree.XML('<a><b><c><d/></c></b></a>') print prettify(myXML) The example above would output the following: <?xml version="1.0"?> <a> <b> <c> <d/> </c> </b> </a> By the way I don't even need to see the XML I'm processing most of the time, so why all the pretty printing fuss? Well, because it bothers me ... And all good XML should look like an X-wing [http://en.wikipedia.org/wiki/X-wing] starfighter. If it doesn't your probably doing something wrong or your schema just sucks. It isn't called an X-wing for no reason. :P