I really like Postgres! I recently discovered a new trick that plays especially well with the access control pattern I recently described: a simple (but rather powerful) audit/logging pattern to track the changes made to a database (i.e. insert
s, update
s and delete
s), and most importantly, allow to query them in a flexible way.
It's easy to set up a trigger
to record changes. However, if you follow this basic pattern, you end up with your update data stored as unstructured text
, which is not the ideal format to query. You could modify that scheme to use as many audit tables as you want to log (thus each reproducing the structure of its target table), but there's a much nicer solution, using PG's hstore
data type.
hstore
is PG's answer to NoSQL: it's a string-based key-value data type, which you can embed inside your relational schema, within which it interoperates with other data types and SQL constructs seamlessly. There's also the newer JSON
type, which plays a similar role, but since its API is not yet as mature as hstore
's, we're not going to use it here.
So here's the audit table:
create extension hstore; create table audit ( audit_id serial primary key, table_name text not null, user_name text not null, action_timestamp timestamp not null default current_timestamp, action text not null check (action in ('i','d','u')), old_values hstore, new_values hstore, updated_cols text[], query text );
And here's the audit trigger that goes with it:
create or replace function if_modified_func() returns trigger as $body$ begin if tg_op = 'UPDATE' then insert into audit (table_name, user_name, action, old_values, new_values, updated_cols, query) values (tg_table_name::text, current_user::text, 'u', hstore(old.*), hstore(new.*), akeys(hstore(new.*) - hstore(old.*)), current_query()); return new; elsif tg_op = 'DELETE' then insert into audit (table_name, user_name, action, old_values, query) values (tg_table_name::text, current_user::text, 'd', hstore(old.*), current_query()); return old; elsif tg_op = 'INSERT' then insert into audit (table_name, user_name, action, new_values, query) values (tg_table_name::text, current_user::text, 'i', hstore(new.*), current_query()); return new; end if; end; $body$ language plpgsql;
Notice how the rows get converted (hstore(old.*)
and hstore(new.*)
). Now suppose that we have a book
table that we'd like to audit:
create table book ( book_id serial primary key, title text, author text, n_pages int ); create trigger book_audit after insert or update or delete on book for each row execute procedure if_modified_func();
If we insert a book in it, we can see the audit mechanism in action:
insert into book (title, n_pages) values ('PG is Great', 250); select action, table_name, new_values -> 'title' as title from audit; action | table_name | title --------+------------+------------- i | book | PG is Great
The third retrieved value (title
) demontrates hstore
's syntax for value retrieval from a key (using the ->
operator), which is just an example among the bunch of operators and functions offered.
If we perform an update on the book:
update book set author = 'Christian Jauvin', n_pages = 300 where book_id = 1; select action, table_name, updated_cols from audit; action | table_name | updated_cols --------+------------+------------------ i | book | u | book | {author,n_pages}
The third column now sports the columns that have been updated, using the updated_cols
mechanism, implemented as a PG array
(another very nice data structure) along with the akeys
function, with which we collect the keys of the hstore
structure resulting from the hstore(new.*)
"minus" hstore(old.*)
operation performed in the "update" part of the trigger function.
It's also easy to perform "before and after" type queries:
select old_values -> 'n_pages' as before, new_values -> 'n_pages' as after from audit where audit_id = 2; before | after --------+------- 250 | 300
Don't forget however that hstore
values are stored as strings, so this for instance wouldn't work:
select old_values -> 'n_pages' + new_values -> 'n_pages' from audit where audit_id = 2; HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
whereas this would:
select (old_values -> 'n_pages')::int + (new_values -> 'n_pages')::int from audit where audit_id = 2;
One final aspect to note: I find that this pattern works well with my access control pattern, where every application user has its own PG role
, on behalf of which the database operations are performed. In this scenario, the user/role gets automatically logged by the trigger (using the current_user
variable). In contrast, an access control system implemented at the level of the application (i.e. with some kind of user
table and all database operations performed on behalf of a single PG role) wouldn't enjoy that much simplicity I believe.
Thank you for this awesome post! I was wondering how to automatically generate a rollback sql statement in the "audit" table for each operation. For example, if "author" and "n_pages" are changed, how to gen a sql query to change it back to old value if necessary?
ReplyDeleteIs there a simple way to replace the values in "query" to the old values?
Thank you!
Also, how to extract the changed column names and old values, so to write something like:
ReplyDeleteUPDATE table_name set akeys(hstore(new.*) - hstore(old.*)) = avals(hstore(old.*) - hstore(new.*))
Is there way to remove the curly braces?