Tabula Rasa

For months I helplessly faced some sort of strange "caching" issue I had with my Elixir/SQLAlchemy duo. It seems I have solved it - finally!

Deployment Background

Since I have deployed two of my sites through the great mod_wsgi Apache module, each application is loaded and served by multiple Python interpreter instances (I will use this term since this can mean both processes or threads).

Problem Appearance

When changes to the database's contents occur on a site, e.g. by inserting or updating a news post, only a few or even just one of the interpreter instances for the corresponding application seem to take note. Re-requesting an affected resource multiple times from the server yields different results, with some showing the old and others showing the new data. Reloading Apache and therefor restarting the whole application fixes this - until the next change in data.

SQLAlchemy Background

SQLAlchemy provides a Session that, among other things, implements an Identity Map, meaning it holds one unique copy of each object requested from the database.

The crucial point I wasn't aware of is that the Identity Map is not dropped or refreshed on each HTTP request to the application. Thus, when one interpreter instance, the one which flush()ed the change in data, updates its "cache" (which actually is not really a cache, since queries are executed anyway - as I read on a mailing list - to be able to compare both results for equality and update if necessary), the others won't get notified of this and keep serving the old data. And since there is no reset between requests, the described behaviour occurs.

Remedy

Somewhat accidently I stumpled upon a section in the SQLAlchemy docs called Lifespan of a Contextual Session. It provides a clear visualization of what there is to be done: Session.remove() should be called by the framework at the end of the request, before the response is sent.

Well, ok. A short moment of thinking reveals to me that my own custom framework, Upraise, has no implementation of database- and therefor SQLAlchemy-specific glue code (and it shouldn't have in the core, as I think). Anyway, should be easy to add that to the actual application. A warm feeling of forthcoming relief surrounded me.

When talking to Marek earlier this day about my long-standing, to-be traumatic, mysterious "caching" nightmare, he suggested implementing the method call as WSGI middleware. Then followed a nearly mandatory discussion about PJE's article WSGI Middleware Considered Harmful.

Since I use some kind of abort() exits inside my application (especially, but not only in the controllers, or views as they are sometimes calld), adding Session.remove() to my application's __call__ method wouldn't help me here. The middleware idea seems to fit best:

# Get the `session` object (the model is based upon Elixir).
from myapp import model

class DatabaseSessionRemoveMiddleware(object):

    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        result = self.app(environ, start_response)
        model.session.remove()
        return result

# Sliding the middleware into the WSGI stack is simple.
app = MyApp()
app = DatabaseSessionRemoveMiddleware(app)

And what can I say: It seems to work as expected! Hooray!

Update: As SQLAlchemy 0.4 for people in a hurry from the Pylons documentation reminded me, one can make use of the finally statement and avoid the middleware by applying it inside the application's __call__ method:

# Get the ``session`` object (the model is based upon Elixir).
from myapp import model

class MyApp(object):

    # initialization ...

    def __call__(self, environ, start_response):
        try:
            # Call the controller ...
            return result
        finally:
            model.session.remove()

app = MyApp()

Note that the unified try/except/finally statement, as specified in PEP 341, was first introduced in Python 2.5.