Quixote.ca

1. Introducing Quixote: A Simple Link Display

HomeLearn
Last Modified: 30 Aug 2006

This is part one in a series of tutorials covering the Quixote framework for writing web applications in Python.

Introduction

Throughout this series I'll use the same example application, a DMOZ.org or Yahoo!-style tree of links. The application will initially be very simple, displaying parts of an unchanging set of links. In future tutorials this simple application will be extended to edit links, customize the display for each user, and other features that will demonstrate various aspects of Quixote. The demo application can be downloaded from http://www.quixote.ca/files/qxdemo/.

Part 1 is the longest tutorial, because it has to introduce Quixote and explain the basics of how applications are structured.

Basic Objects

The usual design pattern for Quixote-based applications, and the pattern you should follow, is to write a bunch of Python classes representing the objects that your application manipulates, and then write a Quixote application that creates and modifies these basic objects. For example:

  • A bug tracker would have Bug and User classes.
  • A discussion board would have Post, Topic, and Thread classes.
  • An e-commerce site would have Item, Order, and Customer objects.
  • Our link farm will have two classes. Category instances contain a list of links and a list of other Category objects that are the sub-categories of this category. Links are represented by a Link class that has a title, URL and description for each link.

These basic objects should contain no code relating to Quixote or the web. If you someday decided that web applications are bad and wanted to turn your application into a PyQt GUI or a set of command-line tools, the basic objects shouldn't require any changes; you should implement them keeping this principle firmly in mind. In our example application, the Category and Link classes both have .as_html() methods that produce an HTML rendering of the object; this is the only method that bears any relation to the web.

What's the advantage of separating basic objects from the user interface?

  • The code enforcing policy lives in exactly one place, namely the code implementing the basic objects. For example, if you require that only certain users can modify the data, this can be enforced by methods on your basic objects. Changing this behaviour then requires changing one method. Carelessly written web applications will scatter this logic throughout the user interface, making changes messier and far-reaching.
  • You might actually want to provide alternative interfaces. Ordinary users may still use the web, but administrative users might need extra capabilities provided by a special PyQt-based interface, or perhaps you'll need command-line scripts that manipulate objects. Avoiding web-specific dependencies in the basic representation makes this flexibility of interface possible.
  • It's easier to test your objects. Testing web applications is hard and the tools for doing so are rather poor. If the basic objects aren't dependent on web-related details, you can write unit tests for your classes using existing tools such as Sancho or the unittest module.

Quixote's authors tend to use one particular organization. Basic objects reside in a Python package, with a test/ subdirectory for unit tests and a ui/ subdirectory that holds the PTL templates and code underlying the web interface. You don't have to follow this organization, but it's a good idea and we've found that it works well.

Our link-tracking application only requires two classes, Category and Link. Skeletal class definitions are included here; implementing these methods is straightforward and won't be discussed further. Refer to the actual code for more information.

class Category:
    """A category in the tree of links.

    Instance attributes:
      id : string
        ID generated from title.
      title : string
        Human-readable title.
      categories : [Category]
        Subcategories contained in this category.
      links : [Link]
        Links in this category.
    """

    def __init__ (self, title):
        self.title = title
        self.id = _generate_id(title)
        self.categories = []
        self.links = []

    def get_category (self, id):
        """(id:string) : Category
        Return the subcategory with the given ID, or None if there's no match.
        """

    def add_link (self, link):
        """(link:Link)
        Add the link to this category, maintaining the list of links
        in sorted order.
        """

    def add_category (self, category):
        """(category:Category)
        Add the category as a subdirectory of this category,
        maintaining the list of subcategories in sorted order.
        """

    def as_html (self, parent_url):
        """(parent_url:string) : htmltext
        Return an HTML representation of a link to a subcategory.
        """

class Link:
    """A single link.

    Instance attributes:
      title : string
        Title, to be used as link text.
      url : string
        URL of link.
      description : string
        Additional text describing the link.
    """
    def __init__ (self, title, url, description=""):
        self.title = title
        self.url = url
        self.description = description

    def as_html (self):
        """() : htmltext
        Return an HTML representation of the link.
        """

I don't want to get entangled in the details of scalable data storage, so the example application won't use anything fancy such as a relational database. Instead, the tree of links will simply be pickled into a file using Python's pickle module, to be unpickled when the application starts up again. I've also included a function that uses Juri Pakaste's python-opml library to convert OPML-format XML files into a tree of links. The details won't be covered here; see the code for the details.

def save (filename, root):
    """save(str, Category)
    Saves a tree into the specified file.
    """

def load (filename):
    """load(str) : Category
    Unpickles a tree from the given file and returns the root.
    """

def parse_opml (filename):
    """parse_opml(filename) : Category
    Parses the contents of the specified file as OPML and 
    returns the root of the resulting tree.
    """

Designing the URLs

Like it or not, the URLs in a web application form part of its user interface. The URL is almost always visible to the user, and users have become accustomed to trying to edit them in order to find their way around a site. . We've all seen sites with URLs that can't be usefully edited, such as www.example.com/0-1429209-8-21007260-5.html?tag=tnav. You can build sites that look this using Quixote, but we can also do better.

I like to come up with a pleasant arrangement of URLs as the first step in designing a web application, taking care that the URLs are readable, editable, and represent a natural hierarchy for my application's functionality. A secondary goal is to give everything a unique URL of its own. (Why is giving everything a unique URL worthwhile? Because it helps in using the REST architectural style. In a future installment I'll write about implementing REST applications with Quixote.)

In our example application, each Category has an ID that's a short alphanumeric string generated from the category's title; this ID is unique among a category's subcategories. For example, a category titled "Python" will have the ID "python", and a category titled "Food & Drink" will have the ID "food_drink". The URL for a node in the tree will be assembled from the IDs of all the category you need to traverse to get there, leading to URLs that look like hobbies/food_drink/wine/. For now we won't bother with URLs for individual links within a category. When we need such URLs, we'll assign sequence numbers to links as they're added to a category, so the first link added to the category would have the URL hobbies/food_drink/wine/1.

In future installments we'll add new URLs as new features are added.

How Quixote Works

The basic idea of Quixote is simple. Each running Quixote application is configured with the name of a Python package or module; in our example application, the package name is qxdemo.ui.links. When a request for a URL such as http://example.com/computing/languages/python/ is received, the Quixote publishing loop uses the package qxdemo.ui as its starting point and treats the URL as a path into the package's contents. This idea came from Zope's object publishing loop which works in much the same way.

Quixote's publishing loop has a few different ways of handling the components making up the path. The simplest is to simply have a subpackage, function, or attribute with the same name; for example, if qxdemo.ui.links contains a module mod containing a function f(), the path http://example.com/mod/f will call qxdemo.ui.links.mod.f(request), passing it an HTTPRequest object representing the properties of the request, and the function's return value will be returned to the client as the output of the transaction.

How does Quixote know which functions should be accessible via HTTP? After all, some functions may be private or may not return anything that can be usefully displayed. Quixote follows the Python design principle "Be explicit", so you simply list the names of public functions in an attribute named _q_exports. (Names treated specially by Quixote always begin with _q_.) Application code usually looks like this:

_q_exports = ['_q_index', 'display', 'login', ...]

def _q_index (request):
    return "Welcome to our Web site"

def display (request):
    return "Display something"

(Listing "_q_index" in _q_exports is not strictly necessary -- Quixote will assume that anything with that name is accessible -- but I tend to include it anyway.)

Another way to control traversal of the URL space is with a _q_lookup() function. Often you'll want to support traversing a set of names without having a separate module or function for each possible name. For example, names might not be legal Python identifiers (e.g. B-0025, 0-486-2333-X), the space of possible names might be infinite (e.g. any integer is legal), or information might need to be looked up in a database. If you define a special function called _q_lookup(request), the publishing loop will call it, passing it a string containing the current component of the path. The _q_lookup() function can return a string which will be immediately returned to the client, or an object that will be traversed further.

For example, the following _q_lookup function will accept any integer so URLs such as http://.../1, .../2, etc. will work. Returning None signals that the component wasn't accepted, causing Quixote to return an HTTP 404 'Not found' error to the client.

def _q_lookup (request, component):
    try:
        intval = int(component)
    except ValueError:
        # Not an integer
        return None
    else:
        return "HTML page for the integer %i" % intval

The third and final way of handling names is to define a function called _q_resolve. Consult the Quixote documentation for the details; _q_resolve is a bit obscure, and most applications won't need to use it.

Templating

Quixote doesn't require that you use any particular way of generating HTML pages. It provides its own method, a variant of Python's syntax called Python Templating Language or PTL, but you don't have to use it. If you prefer, you can generate HTML by gluing strings together (s = s + "<ul>") or by using any Python module you like (e.g. HTMLgen, Zope Page Templates, &c).

Quixote's PTL is rather neat, however, and provides some useful features. This section will introduce PTL, and the remaining sections will use PTL in the examples.

Most templating languages use HTML or XML syntax and introduce a special mechanism for inserting the value of a variable or for control flow such as if...else branches and for loops. PTL takes an exactly opposite approach, using Python syntax and introducing a special mechanism for assembling values into a string.

A simple example will make PTL clear. Consider this function:

def numbers [plain] (N):
    for i in range(N):
        i
        " "

To a quick glance this looks like a Python function definition. However, on a closer look you'll notice the [plain] after the function name; this is the marker used to tell when a function is a PTL template. Another possibility is to use [html] instead of [plain]; later we'll explain what the difference is.

The numbers() function will be compiled into a regular Python function once a special import hook is enabled. Two lines of code are needed to enable PTL's import hook:

import quixote
quixote.enable_ptl()

After running the above two lines, you can now import files ending in .ptl as Python modules. If the above function is in a file named test_page.ptl, you can now do the following, either in a module or at the Python interpreter prompt:

>>> import test_page
>>> test_page.numbers(5)
'0 1 2 3 4 '

Let's look at the body of the function again:

def numbers [plain] (N):
    for i in range(N):
        i
        " "

There's no return statement, so why is this function returning a value? The effect of the [plain] mechanism is to compile the function specially; any bare expression or function call that returns a value will have its value converted to a string and added to the eventual output of the function. (Exception: None is ignored. If it wasn't, any calls to functions that don't return anything would need a fake assignment to avoid cluttering the output with "None" strings. For example, ignoring None means you don't have to write dummy = sys.stderr.write("...").)

String literals are Python expressions too, so a bare string literal is a good way to add text to the output. The " " in the final line prevents the numbers from being all run together; without it the output of numbers(5) would be "01234".

Here's an example of generating HTML with PTL:

def header [plain] (title, stylesheet):
    "<html>"
    "<head>"
    "<title>%s</title>" % title
    if stylesheet:
        """<link rel="stylesheet" href="%s" 
            type="text/css" />""" % stylesheet
    "</head>"
    "<body>"

This function can be called as header("Index") to generate the header for a page with no stylesheet, or as header("Index", "/base.css") to use the stylesheet located at /base.css.

An alert reader may have spotted a bug in the above header() function: if the title contains one of the characters "<>&", this character is added to the output as-is, causing a malformed header when the title is something like "The <blink> HTML Element".

One way of fixing this would be to run the title through a function that replaces the "<>&" characters with the HTML/XML entity notation for them, "&lt;&gt;&amp;". Quixote provides such a function, html_quote(), in the quixote.html module. However, it's easy to forget an html_quote() call and, if you have many template functions that call other template functions, it's hard to figure out at which level this quoting should be performed.

Forgetting to quote some text is an easy mistake to make, and it often leads to a type of security hole called a "cross-site scripting" attack. An attacker who can manage to create categories can create one whose title is something like Title <a href="javascript:...">link</a>; if the title isn't quoted properly, such a title will create a link that invokes JavaScript, leading to several dangers ranging from annoyances (popping up a zillion browser windows) to dangers (redirecting the browser to the 'delete all categories' URL, if one exists).

Quixote provides an automatic way of getting correct quoting. The key is to use [html] instead of [plain]. With plain templates, values are simply converted to strings and then concatenated. With HTML templates, the function is correct as written:

def header [html] (title, stylesheet):
    "<html>"
    "<head>"
    "<title>%s</title>" % title

In an HTML template, string literals result in a special htmltext class being used instead of regular Python strings. The above definition is compiled to Python bytecode that's equivalent to this:

from quixote.html import htmltext
def header [plain] (title, stylesheet):
    htmltext("<html>")
    htmltext("<head>")
    htmltext("<title>%s</title>") % title

htmltext instances are assumed to have been properly escaped. When you perform an operation involving both htmltext instances and regular strings, the htmltext class will automatically perform quoting on the regular string, and the result will be a new htmltext instance. For example:

>>> from quixote.html import htmltext        
>>> htmltext("<title>%s</title>") % "The <blink> HTML Element"
<htmltext '<title>The &lt;blink&gt; HTML Element</title>'>

The substituted string has been properly escaped.

The result is that everything works very intuitively when you use the [html] marker. HTML tags in literal strings inside your program text don't need to be specially marked. Strings coming from a file, a database, or from the HTTP request will be regular strings and therefore will get quoted as necessary. You can therefore write templates that read like straightforward Python code yet still be assured that there are no cross-site scripting holes in your application.

(See doc/PTL.txt in the Quixote documentation for a lengthier explanation of PTL and discussion about a few corner cases that you may encounter.)

Structure of the Example Application

A common pattern in Quixote applications is to write UI classes corresponding to some or all of your basic objects. In our example, we'll have a CategoryUI class that corresponds to the Category class.

The purpose of the UI class is to act as a wrapper around a basic object and expose a number of methods that can be directly accessed from the web. A _q_lookup() method will retrieve the requested subcategory, wrap a CategoryUI instance around it, and return the newly created CategoryUI.

Here's the skeleton of the CategoryUI class:

class CategoryUI:
    _q_exports = ['_q_index']

    def __init__ (self, category):
        self.category = category

    def _q_index [html] (self, request):
        ...

    def _q_lookup (request, component):
        ...               

In this version, only the _q_index method is supported. Adding a new action or view to the UI is easy; add the method definition and add its name to _q_exports. For example, if you wanted to support an .../edit action that would return a form for editing the category, the modified code would be:

class CategoryUI:
    _q_exports = ['_q_index', 'edit']

    def __init__ (self, category):
        self.category = category

    ... 

    def edit [html] (self, request):
        ...

Here are the complete contents of the _q_index() method. Most of it is straightforward; the code looks at various attributes of the Category instance being wrapped, available as self.category, and generates the appropriate HTML. header() and footer() are helper functions that generate the top and bottom of the HTML page, "<html><head>...<body>" and "</body></html>" respectively.

class CategoryUI: 

    def _q_index [html] (self, request):
        header(self.category.title)

        # Display list of subcategories
        cat_list = self.category.categories
        if cat_list:
            "<h3>Categories</h3>\n"
            "<ul>\n"
            for cat in cat_list:
                "<li>"
                cat.as_html(request.get_url())
                "</li>"
            "</ul>\n"
        else:
            "<p>No subcategories.</p>\n"

        # Separator line
        "<hr />\n"

        # Display links contained within this category
        link_list = self.category.links
        if link_list:
            "<ul>\n"
            for link in link_list:
                "<li>"
                link.as_html()
                "</li>\n"
            "</ul>"
        else:
            "<p>No links in this category.</p>"

        footer()

The _q_lookup() function is the second method we need to implement. It's much shorter but a little more subtle:

from quixote import errors

def _q_lookup (self, request, component):
    subcat = self.category.get_category(component)
    if subcat is None:
        # No such subcategory
        raise errors.TraversalError("No such category")
    c = CategoryUI(subcat)
    return c

Consider the URL .../computing/languages/python/, and imagine that the CategoryUI instance is wrapping the Category instance for the "Computing" category. On the call to this method, component will be "languages". We need to call the Category.get_category() method on self.category to get the desired subcategory. If get_category() returns None, there's no such subcategory, so we raise the TraversalError exception; Quixote will catch the exception and return an HTTP 404 Not Found error. Otherwise, we take the subcategory and wrap it up in a CatalogUI instance. That instance's _q_lookup() method will be called for "python".

These two methods, _q_index and _q_lookup, are all that's needed to publish the tree of links. We still need top-level functions that will handle the root of the tree, but they're easy; we simply get the root of the tree, wrap it up in a CatalogUI instance, and call the desired method:

def _q_index [html] (request):
    c = CategoryUI(root)
    return c._q_index(request)

def _q_lookup (request, component):
    c = CategoryUI(root)
    return c._q_lookup(request, component)

That completes the simplest version of this application. Future articles will extend it.

Running the Example Application

There are several different ways to run Quixote applications. Quixote includes support for mod_python and FastCGI, and plain CGI. My personal favorite is SCGI, which uses a separate long-running process that is started and stopped independently of the web server; this makes SCGI simple, reliable, and fast.

For this demo, though, we don't need anything very fancy. Quixote includes support for two different Python-only HTTP servers: the not-really-maintained Medusa package and the newer and very active Twisted Python package. I'm more familiar with Medusa, however, and the qxdemo distribution includes a script, scripts/links-server.py, that uses Medusa to serve up the application on TCP port 8080.

To start the server, simply run the script:

[amk@nyman qxdemo]$ python scripts/links-server.py
Now serving the qxdemo.links demo on port 8080
info: Medusa (V1.9) started at Mon May 26 04:12:39 2003
        Hostname: nyman.amk.ca
        Port:8080

Then, point your favorite Web browser at http://localhost:8080:

[amk@nyman amk]$ lynx http://localhost:8080/
                                     Root

  Categories

     * [News]
     * [Pythoneers]
     * [XML people]
  --------------------------------------------

   No links in this category.

The links-server.py script will log accesses on its standard output:

[amk@nyman qxdemo]$ python scripts/links-server.py
Now serving the qxdemo.links demo on port 8080
   ...

127.0.0.1:32839 - - [26/May/2003:08:13:23 -0500] "GET / HTTP/1.0" 200 501
127.0.0.1:32840 - - [26/May/2003:08:13:30 -0500] "GET / HTTP/1.0" 200 501

I won't examine the script in any further detail; you can look at the code yourself and figure it out.

Conclusion

This tutorial has introduced the basic ideas of Quixote (the object publishing loop and the PTL templating language), and has also explained how a simple Quixote application is structured.

Future installments will cover more specialized topics, and will assume that you have understood the material in this tutorial. The code developed here can be downloaded from http://www.quixote.ca/files/qxdemo/.

Acknowledgements

My thanks to Mark Bucciarelli, Larry Tjoelker, and the readers of the quixote-users list for their comments on this document.



Send comments to webmaster at quixote.ca.