Entitled URLs with Werkzeug

# -*- coding: utf-8 -*-

"""
Entitled URLs with Werkzeug
===========================

:Copyright: 2008 Jochen Kupperschmidt
:Date: 18-Jun-2008
:License: MIT
"""

from string import ascii_lowercase, digits

from werkzeug.routing import BaseConverter


class TitleConverter(BaseConverter):
    """A `Werkzeug routing`_ converter that turns strings (usually titles)
    into URL-suitable representations.

    The only characters allowed in the converter's output are ASCII lower case
    letters, digits and the dash; everything else will be removed.  A
    dictionary for custom replacement rules allows e.g. to replace umlauts
    with their two-letter representation to keep the result readable.  Spaces
    are per default replaced with dashes as it is quite common in web blog
    software like Wordpress and others.

    An example URL map that uses this converter in Werkzeug::

        from werkzeug.routing import Map, Rule

        map = Map([
            Rule('/items/<int:id>', defaults={'title': None}, endpoint='show'),
            Rule('/items/<int:id>-<title:title>', endpoint='show'),
            ], converters={'title': TitleConverter})

    The converter also works in Flask_ (both on apps_ as well as blueprints_)::

        # Register it with your app's URL map.
        app = Flask(__name__)
        app.url_map.converters['title'] = TitleConverter

        # Use on app routes …
        @app.route('/articles/<int:id>', defaults={'title': None})
        @app.route('/articles/<int:id>_<title:title>')

        # … or on blueprint routes.
        @article_blueprint.route('/articles/<int:id>', defaults={'title': None})
        @article_blueprint.route('/articles/<int:id>_<title:title>')

    Note that there are basically two very similar URLs defined, one having a
    default.  The rules match both URL paths ending with just an ID, but also
    with any string following an ID and a dash.  Also note that a view
    callable that a request is probably dispatched to via the endpoint string
    has to take both the ID and the title as arguments, but can set the latter
    as a keyword argument with a default of ``None`` and ignore it apart from
    that.

    .. _Werkzeug routing: http://werkzeug.pocoo.org/documentation/routing
    .. _Flask:            http://flask.pocoo.org/
    .. _apps:             http://flask.pocoo.org/docs/quickstart/#routing
    .. _blueprints:       http://flask.pocoo.org/docs/blueprints/#building-urls
    """

    # allowed characters
    allowed = frozenset(ascii_lowercase + digits + '-')

    # These characters should be replaced instead of being dropped.
    replacements = {
        u'ä': u'ae',
        u'ö': u'oe',
        u'ü': u'ue',
        u'ß': u'ss',
        u' ': u'-'}

    def to_url(self, value):
        """Format ``value`` as a readable but URL-safe string."""
        value = value.lower()

        # Replace characters as defined.
        for old, new in self.replacements.iteritems():
            value = value.replace(old, new)

        # Ignore all non-ASCII characters.
        value = value.encode('ascii', 'ignore')

        # Remove all characters that are not explicitly allowed.
        value = filter(lambda char: char in self.allowed, value)

        return value