Pretty-printing Atomixlib Output

In the post on my first impressions of Atomiblix I noted the lack of indented XML output. Although I don't think it's a must-have, it can be helpful sometimes or just look nicer. Here is a quick hack on how to achieve that.

The main part is a prettyprint function by Fredrik Lundh, the ElementTree author himself:

def indent(elem, level=0):
    """Add whitespace to the tree, so that saving it as usual
    results in a prettyprinted tree.
    """
    i = '\n' + level * '  '
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + '  '
        for elem in elem:
            indent(elem, level + 1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i

The unmodified output of Atomixlib has every tag put into the atom: namespace. But after loading and dumping it with ElementTree, that changed to the less appealing name ns0:.

To get around that, I use a snippet of mine to remove the whole namespace:

def remove_namespace(doc, namespace):
    """Remove namespace in the passed document in place."""
    ns = u'{%s}' % namespace
    nsl = len(ns)
    for elem in doc.getiterator():
        if elem.tag.startswith(ns):
            elem.tag = elem.tag[nsl:]

That way, the explicit namespace declaration for each tag is removed, which can save quite some bytes. Since the prefixed xmlns attribute got lost by doing so, we want to set a new one, this time using the default namespace.

Putting it all together:

elem = ET.fromstring(str(f))
indent(elem)
namespace = u'http://www.w3.org/2005/Atom'
remove_namespace(elem, namespace)
elem.set('xmlns', namespace)
f = ET.tostring(elem)

Happy feeding :)