Multiline Code Blocks for Markdown With Syntax Highlighting

Since I'm releasing lots of code on this site, being able to have code blocks in articles sounded like a good idea.

At first, I wrapped the code in standard XHTML tags and added some formatting via CSS.

Then, I started to use the Markdown syntax (through its Python port) in my posts, and their source became much easier to read and write.

Unfortunately, I ran into some serious problems with them. It's been a while ago and I cannot remember anymore what exactly was the issue, sorry. As far as I can remember, they were related to indention, line breaks and (unintentional) Markdown syntax in a code block.

However, syntax highlighting was also not available and I really wanted it for better readability. I've already been using the great Pygments package for my code snippets and it proved to be an excellent choice.

So, I wanted to have a Markdown syntax element like this:

[sourcecode:lexer]
some code
[/sourcecode]

lexer can be any language short name supported by Pygments.

Here is a Markdown preprocessor that uses Pygments to highlight the content enclosed by the above syntax:

import re

from markdown import Preprocessor
from pygments import highlight
from pygments.formatters import HtmlFormatter
from pygments.lexers import get_lexer_by_name, TextLexer


class CodeBlockPreprocessor(Preprocessor):

    pattern = re.compile(
        r'\[sourcecode:(.+?)\](.+?)\[/sourcecode\]', re.S)

    def run(self, lines):
        def repl(m):
            try:
                lexer = get_lexer_by_name(m.group(1))
            except ValueError:
                lexer = TextLexer()
            code = highlight(m.group(2), lexer, HtmlFormatter())
            code = code.replace('\n\n', '\n \n')
            return '\n\n<div class="code">%s</div>\n\n' % code
        return self.pattern.sub(
            repl, '\n'.join(lines)).split('\n')

Then, the preprocessor can be integrated like this:

from markdown import Markdown

md = Markdown()
md.preprocessors.insert(0, CodeBlockPreprocessor())
markdown = md.__str__

markdown is then a callable that can be passed to the context of a template and used in that template, for example.

Finally, have Pygments generate a stylesheet (pygments.css in this example) to be embedded into the website:

$ pygmentize -S <some style> -f html > pygments.css

Here you are, enjoy the new colorful code presentation!

Update: As of Pygments 0.9, released October 14, 2007, the code presented here is included in the distribution as external/markdown-processor.py.

Update: In case you're writing documents in reStructuredText which contain code blocks, you might appreciate that as of docutils 0.9, released May 02, 2012, both a directive as well as an interpreted role for code, highlighted by Pygments, are supported out-of-the-box.