Tech Notes: PostgreSQL Beginning to Meta

Table of contents

  1. Context for these notes
  2. Getting Started with PostgreSQL
  3. Functions and Metaprogramming
  4. Metaprogramming Power
  5. Metaprogramming and Design

Context for these notes

These notes condense some of what I tried to express during two Tech Salons on PostgreSQL in 2010. The lucky guinea pigs who were there will hopefully enjoy this document as a review and I promise some interesting new material. I hope that this document will also assist anyone wanting to join any future projects in which PostgreSQL is a key tool.

Nearly every serious software project needs a database system. Rather than trying to recap the reasons why we see PostgreSQL as the best choice I'll just reference the PostgreSQL website and encourage everyone to browse it and then ask questions.

Getting Started with PostgreSQL

Once you decide to give PostgreSQL a chance, download and install it on some convenient computer. Find the PostgreSQL tutorial and spend some time working through the SQL examples. If you're a self-starter you could do this on your own; alternatively you might arrange to do this with a partner, perhaps someone who has already done some of this, using the "each one teach one" model.

Once you've gone through the tutorials, I recommend the following project:

Create a schema (for now we'll think of a schema as a set of table and view definitions) which describe your life. It should contain several primary tables connected by linkage tables. One of the tables should be TABLE persons which should have a row for a number of diverse important people in your life. One of the rows should represent you! Some other possible entity tables might be

connected by linkage tables such as Notice the convention that table names should be plural nouns.

You'll also want tables and/or enumerations to express types of people, groups, etc. as well as other properties. You should ensure referential integrity with primary keys and foreign key references.

Make the project fun: Avoid doing too much that is similar or tedious. Look for how to represent diverse and interesting features of your world. Share what you've come up with with others in the Tech Group.

You can work with PostgreSQL using any of several graphical interfaces, or with the psql client in a terminal window. I've found that serious work becomes much easier if you bite the bullet and learn Emacs. Emacs will let you run one-or-more database sessions in windows and quickly move definitions and results between sql textfiles, documents, web pages and one or more databases.

Functions and Metaprogramming in PostgreSQL

You can access PostgreSQL from applications written in any programming language. You can do server-side PostgreSQL programming in more than a dozen programming languages. Three are built-in and worth special mention:

C Procedural Language
PostgreSQL is written in C which compiles to efficient native code on any of today's computers. PostgreSQL's syntax, datatypes, operators, index mechanisms, and more are written as plug-in modules, making PostgreSQL an extremely flexible and extensible database system. If you write your extensions in C then your extensions are just as efficient as PostgreSQL's built-in features. You could do all of your server-side programming in C but it would be a lot of work because C is a very low-level language and the Server Programming Interface which is the C API provided by PostgreSQL is cumbersome and error prone. I only write essential server-side code in C.
SQL Functional Language
SQL is the industry standard for relational database programming. PostgreSQL does a good job of supporting standard SQL while providing some very nice extensions. SQL functions are simply SQL commands wrapped up into functions which take arguments. They are fully type-checked at definition time and are in-line substituted into queries which use them, so their bodies become part of the execution plan of the query which gets optimized as a whole before it is run. These features make SQL functions very efficient and I try to write most of my functions in SQL. However, there are a few essential things which you can't do in SQL functions.
PL/pgSQL Procedural Language
PL/pgSQL is a hybrid interpreted language which allows you to mix SQL commands and procedural code with variables, assignments, if and loop statements, etc. There are some essential things which you can do in PL/pgSQL which you can't do in SQL (and which would be much more difficult to do in C) including:
  • Grouping multiple statements into transactions
  • Throwing or catching exceptions
  • Metaprogramming
Metaprogramming is the most powerful thing you can do with PL/pgSQL but I generally do it by breaking the pieces up into short and simple SQL language functions which generate the SQL commands which are then performed by a PL/pgSQL function. This division of labor keeps the complexity down and aids debugging.

Metaprogramming Power

In the second Tech Salon on PostgreSQL we looked at some examples of metaprogramming. For example, in the Wicci System the single meta-function call

SELECT create_ref_type('doc_node_refs');
replaces around 200 lines of regular code which includes

Then the single meta-function call

SELECT create_type_class('doc_node_refs', 'doc_node_tree_rows');
creates an object oriented class in which the rows of the TABLE doc_node_tree_rows serve as instances.

Finally, meta-function calls like

SELECT type_class_op_method(
	'doc_node_refs', 'doc_node_tree_rows',	--  the type and table of the class
	'ref_env_crefs_to_text_op(refs, env_refs, crefs)',	-- the C language operation
	'tree_doc_to_text(doc_refs, env_refs, crefs)'	-- the SQL language method
bind specific SQL functions to the operations of the new class.

We're using metaprogramming to build support for the object-oriented programming paradigm, with full static type-checking, into PostgreSQL which out-of-the-box only supports the relational and procedural programming paradigms. Once you learn these three meta-functions you're able to design at a higher level, write much less code and get much better type-checking without sacrificing efficiency.

Metaprogramming and Design

Where and how to use metaprogramming is a serious design issue. We saw a simple example of metaprogramming where after an abstract base table (the parent table of an inheritance hierarchy) was declared we used the meta-function call

SELECT declare_abstract('table name');
to attach a trigger to the table which will throw an exception should we ever try to insert any rows in that table.

An alternative might be a meta-function

SELECT declare_abstracts_in('schema name');
which would go through all tables in the given schema whose names start with abstract_ and automatically call declare_abstract on each one. This would create an auto-magic feature where the naming convention of the prefix abstract_ produces a semantic result. At this time I have chosen not to do this.

But consider the situation where we frequently have pairs of SQL functions which are defined like this:

FUNCTION try_schema_trim(text) RETURNS text AS $$
 SELECT lower(regexp_replace($1, '[[:space:]]', '', 'g'))

FUNCTION schema_trim(text) RETURNS text AS $$
 SELECT non_null(try_schema_trim($1), 'schema_trim(text)')
$$ LANGUAGE sql;

COMMENT ON FUNCTION schema_trim(text) IS
'Normalize a comma-separated list of schemas';

SELECT module_provides_function('schema_trim(text)');
How might we reduce the boilerplate with some metaprogramming?

One way would be to create a meta-function which creates the second function given the signature of the first one. It could also handle the commenting and the registration of both functions as belonging to the current module. The new code would look like this:

FUNCTION try_schema_trim(text) RETURNS text AS $$
 SELECT lower(regexp_replace($1, '[[:space:]]', '', 'g'))

SELECT create_non_null_function_for(
	'Normalize a comma-separated list of schemas'
Does this seem a bit nicer?

We could go whole hog and create all sql functions through meta-functions. We could create single functions like this:

SELECT create_strict_sql_function(
	'try_schema_trim',	-- name of function to create
	'Try to normalize a comma-separated list of schemas', $$
		SELECT lower(regexp_replace($1, '[[:space:]]', '', 'g'))
	'text', 'text'	-- return and argument type(s)

SELECT create_sql_function(
	'schema_trim',	-- name of function to create
	'Normalize a comma-separated list of schemas', $$
		SELECT non_null(try_schema_trim($1), 'schema_trim(text)')
	'text', 'text'	-- return and argument type(s)
but then it becomes easy to create a meta-function for just such pairs as these and the code reduces to:
SELECT create_strict_and_non_null_sql_function_pair(
	'schema_trim',	-- for both try_schema_trim and schema_trim
	'Normalize a comma-separated list of schemas', $$
		SELECT lower(regexp_replace($1, '[[:space:]]', '', 'g'))
	'text',	'text'	-- return and argument type(s)
with the result being the creation of both functions, both with comments and both registered with the module in which this code appears.

Metaprogramming using PostgreSQL meta-functions can be defined to automate and enforce any design or implemetation patterns. However, they then require that anyone working with that code learn what those meta-functions do. If you want to make it easy for anyone who knows standard SQL to read and understand your code you should use meta-functions very sparingly. I'm still defining most of my functions using standard SQL commands rather than calling meta-functions. It is only when repetitious code starts getting hard to maintain that I will stop and define one or more meta-functions to abstract away the bloat.

These examples only scratch the surface of what is possible with meta-programming in PostgreSQL. I hope that you find the possibilities exciting!

Additional Resources

Here are some additional interesting PostgreSQL-related links

How to contact me

Use the handy Pu'uhonua Contact Page

Return to Touch Pu'uhonua's Writings Return to Touch Pu'uhonua