Monday, April 1, 2013

Impossibly Lean Access Control with Postgres + Flask

One of the most common feature for a database-backed web application is to implement some kind of access control mechanism to regulate what different types of users (admins, guests, etc) can do. There are of course many ways to implement such a pattern: you can do it client-side, by modifying your UI dynamically (e.g. graying out certain widgets) according to the identity of the user, or server-side (as part of the so-called business rules maybe). Often a mix of both approaches will be used.

In this article, I will to show a way to implement such a pattern in a very simple way, using Postgres, Flask, Flask-Login and little_pger, a very thin SQL wrapper just above psycopg2 which I created.

Because the idea has received some criticism, I must say that (1) it is only a proof of concept (i.e. probably not robust enough for production in its current form) and (2) I don't think that such a pattern would scale well to a big userbase (it's rather meant for small-scale, intranet-style webapps with a limited number of users, which is what I've been mainly developing). The idea for the pattern was inspired by this blog post, that I read a while ago:

http://database-programmer.blogspot.ca/2009/02/comprehensive-database-security-model.html

while my skepticism toward ORMs in general stems from this even older (and admittedly controversial) post:

http://database-programmer.blogspot.ca/2008/06/why-i-do-not-use-orm.html

I say my pattern is "impossibly lean" because it introduces a minimal amount of code and concepts (not even the shadow of an ORM will be found here), by leveraging PG's role access management subsystem. These powerful features are already available the moment you create a PG database, so why would you need to reinvent the wheel?

Let's start by supposing that our application has a very simple database, with only one table:

create database pg_flask_ac;

\connect pg_flask_ac

create table foo (
  foo_id serial primary key
);

Suppose also that we have two users for it: an almighty admin (who can do whatever pleases him), and a shy guest (only allowed to look around). Although we could create an additional SQL "user" table, with a column dedicated to the representation of their privilege levels, another solution is to simply create corresponding PG roles:

create role joe_admin login superuser password '<secret>';
create role joe_guest password '<secret>';
grant select on all tables in schema public to joe_guest;

joe_admin, being a superuser can perform any SQL operation, while joe_guest is restricted to only being able to select things.. that seems fine, but is there a way to leverage these SQL-level constraints into our application? Sure we can, and here's all it takes to implement the pattern:

from flask import *
from flask.ext.login import *
import psycopg2, psycopg2.extras
import little_pger as pg

app = Flask(__name__)
app.secret_key = 'a secret!'
login_manager = LoginManager()
login_manager.init_app(app)

class User(UserMixin):
    def __init__(self, oid):
        self.id = oid

# called first: here we connect as an admin (with the login privilege)
@app.before_request
def before_request():
    g.db = psycopg2.connect("dbname=pg_flask_ac user=joe_admin",
                            connection_factory=psycopg2.extras.RealDictConnection)

# called next: from the admin connection, we fetch the name of the
# target role (identified with the unique (o)id encrypted in the session cookie)
@login_manager.user_loader
def load_user(id):
    cur = g.db.cursor()
    rn = pg.select1(cur, 'pg_authid', 'rolname', where={'oid': id})
    if rn:
        # executed on behalf of the admin
        cur.execute('set role %s', [rn])
        # from this point any command performed via g.db.cursor() will be
        # on the target role's behalf
        return User(id)
    return None

The interesting part is the interplay between before_request (Flask) and load_user (Flask-Login) which are both called automatically, before serving any view request. The first simply sets an admin connection from which the second will be able to switch user (by first fetching the appropriate role name in the pg_authid table with the id stored in the session cookie and then issuing the set role SQL command using it). We can view this mechanism in action by adding two views with different access policies:

# this read-only view should be available to anyone
@app.route('/look_at_something')
@login_required
def look_at_something():
    return jsonify({'success': True, 'count': pg.count(g.db.cursor(), 'foo')})

# this view should only be available to an admin; it will raise an exception
# (at the SQL level) if accessed by anyone else
@app.route('/change_something')
@login_required
def change_something():
    pg.insert(g.db.cursor(), 'foo')
    g.db.commit()
    return jsonify({'success': True})

which we can then drive using some Flask testing code:

if __name__ == '__main__':
    for user in ['joe_admin', 'joe_guest']:
        for view in ['/look_at_something', '/change_something']:
            with app.test_request_context(view):
                app.preprocess_request()
                oid = pg.select1(g.db.cursor(), 'pg_authid', 'oid',
                                 where={'rolname': user})
                login_user(User(oid))
                print '%s %s..' % (user, view),
                try:
                    app.dispatch_request()
                    print 'and succeeds!'
                except:
                    print 'but fails..'

# joe_admin /look_at_something.. and succeeds!
# joe_admin /change_something.. and succeeds!
# joe_guest /look_at_something.. and succeeds!
# joe_guest /change_something.. but fails..

Of course an access violation exception by itself wouldn't be terribly useful, so an easy way to use that pattern in a real environment would be to intercept it using Flask's errorhandler decorator, in order to return, for instance, a JSON formatted error message, along with a 403 response (upon which the client should presumably react appropriately):

@app.errorhandler(psycopg2.ProgrammingError)
def db_access_error(e):
    msg = "This user doesn't have the required privilege to perform this action."
    return jsonify({'success': False, 'message': msg}), 403

Finally, note that we didn't make use of the role passwords up to this point, because we assumed that a user had already been logged in, but of course it must occur at some point in the process, and here's a possible way to do it (which implies knowing about the particular way PG encrypts passwords in the pg_authid table):

@app.route("/login", methods=['POST'])
def login():
    user = request.form('user')
    pw = request.form('pw')
    cur = g.db.cursor()
    cur.execute("""select * from pg_authid where rolname = %s and
                                                 rolpassword = 'md5' || md5(%s || %s)""",
                [user, pw, user])
    u = cur.fetchone()
    if u:
        login_user(User(u['oid']))
    return jsonify({'success': u is not None})

Tuesday, March 12, 2013

Suspension of Parser Disbelief

At last, it's done: I have expanded the 53-section prototype of my gamebook.js engine into a fully playable, complete implementation of Fire on the Water, the second gamebook in the Lone Wolf series, created by Joe Dever in the 80s. To the obvious question (Why start with the second book?), I don't really have a good answer, apart from the fact that FotW was one of the earliest gamebooks I've read (in French, as La traversée infernale), and always one of my favorites (I think in large part because of the maritime theme).

So to recap, gamebook.js is an experimental crossbreed between two classic genres: interactive fiction (IF) and gamebooks. Instead of navigating an explicit menu of choices (as with a classical gamebook), those are rather willfully concealed after each section, and your job is to "reveal" them by typing any command you want (using clues from the text) which the parser then tries to match. The idea is to expand, as much as possible, the feeling of "freedom" while exploring the gamebook world, as well as offering a novel (and hopefully fun) way to interact with it.

So far so good.. but given the typical small number of choices for a section, what's the point really in having a parser, and how does it work anyway? It's true that it obviously couldn't be as sophisticated as a complex IF engine, but I think the one I've created works well in a gamebook setting, and listing its relevant features is probably the best way to show why:
  • A stemmer is used, to reduce word inflections to common, more easily matchable forms (e.g stemmer = stemming = stemmed = stem)
  • A meticulously hand-assembled table of synonyms is also used
  • The (web-based) textual environment supports contextual word autocompletion (thus offering hints, which can be turned off)
  • The ambiguity of certain choices can be enhanced or lessened with (again) tediously, manually curated sets of words (although the default is to match the words (and their synonyms) found in the text of the choices)
  • Although complex sentence structures are not supported (as it would be useless in most cases), compound expressions can be used (e.g. "go inside") to enhance meaning (or illusion thereof)
  • The Action Chart and item management subsystem is seamlessly integrated in the textual interface (so after winning a combat, you can "use your Healing Potion", for instance, or "eat a Meal", if required)
So the trick is actually to suspend your "parser disbelief", and allow yourself to compose commands as they naturally come, and more often than not (I contend) the flow and coherence of the story will make it work seamlessly.

Some Technical Aspects (for those interested)

  • The engine is open and implemented in 100% JavaScript (it should run at least in recent versions of Chrome, Firefox and Safari), using a few excellent external packages (the jQuery Terminal plugin, which I modified a bit, and a JS implementation of the classic Porter Stemmer algorithm)
  • The content is fully contained in a single, highly structured JSON file, which must be hosted on Project Aon for legal reasons
  • The extraction of content from the original Project Aon XML file is semi-automated using a Python script, which weaves it with another customized, handcrafted "override" file, to yield a single final JSON file (in theory, this means that expanding to other LW books should be fairly easy, although the amount of work that must go in the "override" file should not be underestimated, as it would be very hard to fully automate everything)
  • The JSON content file (almost a DSL in itself) has an elaborate set of structures to implement the (sometimes complex, borderline ambiguous) ruleset (boolean requirement operators for certain choices (i.e. must have X amount of gold, or possess the Kai Discipline of Y), item description structures, etc.)
  • The architecture of the engine is adequately decoupled: the general logic (applicable to any LW gamebook) is contained in a single module (~1500 lines), while the book-specific rules and behaviors are kept in a separate module (in this case, ~500 lines of FotW-specific code) with a "pluggable" interface (where any section needing a special treatment simply has an entry in a dictionary data structure)