(Not Only) Lines and Ellipses

Lately, I found myself playing with GraphViz and its Python binding, pydot. With little effort, it draws very nice graphs (check out the gallery) from plain text .dot files, which itself are easily generated with a few lines of Python code.

I decided to visualize the tags on this site and their relations to each other. Note that their popularity, as shown in the tag cloud and determined through the number of times they are used for news posts, products, and snippets (I don't use them for links, yet), is not incorporated. Instead, the number of times descriptions of other tags reference them will be used.

To have fresh data, I added a few lines of code to this site which provide YAML data of the available tags and the ones their descriptions link to, so I could fetch it using urllib2.urlopen(). The data looks like this:

ajax: [javascript, xml]
email: [irc, im]
routes: [python, rails, mod_rewrite]
wxpython: [gui, python]
xhtml: [xml, css]
...

The following code generates the graph, representing the tags as nodes and references as edges, adding the number of references to a tag to the node label and colors the node according to its popularity:

from pydot import Dot, Node, Edge
import yaml


# settings
PREFIX = 'taglinks'
COLORS = '#deebf7 #6baed6 #4292c6'.split()
THRESHOLDS = [2, 4]

# Load data from YAML file.
data = yaml.safe_load(open(PREFIX + '.yml'))

# Build list of all referenced tags.
links = []
for src, dsts in data.iteritems():
    links.extend(dsts)

dot = Dot()
for src, dsts in data.iteritems():
    # Determine popularity.
    links_count = links.count(src)
    c_idx = 0
    for th in THRESHOLDS:
        if links_count > th:
            c_idx += 1

    # Create nodes and edges.
    dot.add_node(Node(src, label='%s (%d)' % (src, links_count),
        color=COLORS[c_idx], style='filled'))
    for dst in dsts:
        dot.add_edge(Edge(src, dst))

# Create a PNG image file.
prog = 'dot'  # Try 'fdp', too.
filename = '%s_%s.png' % (PREFIX, prog)
dot.write_png(filename, prog=prog)

If you want to show the content of the generated dot structure, you can use print dot.to_string().

To immediately display the generated image with IrfanView on Windows:

import os
os.system('C:/Programme/IrfanView/i_view32.exe %s' % filename)

Take a look at the resulting images, generated with the algorithms dot and fdp, respectively (each about 70 KB in size):