merge_by_key provides a means for a user to merge a result set into a table based on that table's primary key. It provides interfaces for simple merges, and conditional ones.
This is not a replacement for the MERGE command. It is a use-case of merge that works right now with versions 8.1 and greater of PostgreSQL.
To install MBK, get the source from: http://mbk.projects.postgresql.org/files/merge_by_key-1.0.tar.gz
Fetch:
fetch http://mbk.projects.postgresql.org/files/merge_by_key-1.0.tar.gz
Or
curl -O http://mbk.projects.postgresql.org/files/merge_by_key-1.0.tar.gz
Or
wget http://mbk.projects.postgresql.org/files/merge_by_key-1.0.tar.gz
Extract:
tar zxf merge_by_key-1.0.tar.gz
Install:
cd merge_by_key-1.0 psql -f source.sql
That's it. merge_by_key is now available for use in the target database.
In the following examples, the dst table will be used as the target:
CREATE TABLE dst (k int PRIMARY KEY, c1 text, c2 int);
In this case, a row where k = 1 is merged:
INSERT INTO dst VALUES (1, 'foo', 1); INSERT 0 1 SELECT * FROM merge_by_key( 'dst'::text, 'select 1 AS k, ''bleh''::text AS c1, 123 AS g'::text );
NOTICE: ignoring absent columns for merge, {c2} CONTEXT: SQL function "merge_by_key" statement 1 SQL function "merge_by_key" statement 1 -[ RECORD 1 ]------+--------------------------- start_time | 2008-05-07 10:30:37.035424 end_time | 2008-05-07 10:30:37.064345 create_count | 1 create_time | 00:00:00.004621 update_using_count | 1 update_using_time | 00:00:00.000217 delete_using_count | 1 delete_using_time | 00:00:00.000177 insert_count | 0 insert_time | 00:00:00.00009
Notably, this shows merge_by_key's tolerance to missing columns. While this does allow potential errors to occur, it does provide you with an indication of that possibility everytime it happens. With the presence of the notice, the provided convenience out-weighs the hazard of an erroneous merge.
In this case, a row where k = 1 is merged iff the new c2 is less than the old, existing, c2:
INSERT INTO dst VALUES (1, 'foo', 5); INSERT 0 1 SELECT * FROM merge_by_key( 'public', -- table schema 'dst', -- table name 'mnew.c2 < mold.c2', -- merge condition 'select 1 as k, ''bleh''::text as c1, 2 AS c2' );
The merge created one and updated one:
-[ RECORD 1 ]------+--------------------------- start_time | 2008-05-08 13:56:46.693153 end_time | 2008-05-08 13:56:46.723416 create_count | 1 create_time | 00:00:00.005627 update_using_count | 1 update_using_time | 00:00:00.000356 delete_using_count | 1 delete_using_time | 00:00:00.000179 insert_count | 0 insert_time | 00:00:00.000093
The effect, c1 is now 'bleh':
SELECT * FROM dst; -[ RECORD 1 ] k | 1 c1 | bleh c2 | 2
Merge again, but where the condition fails:
SELECT * FROM merge_by_key( 'public', -- table schema 'dst', -- table name 'mnew.c2 < mold.c2', -- merge condition 'select 1 as k, ''foo''::text as c1, 2 AS c2' );
No update occurred, and no new records were inserted:
-[ RECORD 1 ]------+--------------------------- start_time | 2008-05-08 14:03:12.581652 end_time | 2008-05-08 14:03:12.62839 create_count | 1 create_time | 00:00:00.022227 update_using_count | 0 update_using_time | 00:00:00.000226 delete_using_count | 1 delete_using_time | 00:00:00.000183 insert_count | 0 insert_time | 00:00:00.000092
The effect, nothing changed:
SELECT * FROM dst; -[ RECORD 1 ] k | 1 c1 | bleh c2 | 2
merge_by_key returns timing and quantity information about the merge process. It uses OUT parameters to provide this data. OUT parameter names and meaning:
PostgreSQL 8.1 and greater is required for merge_by_key to work properly. Notably, UPDATE statements must have the ability for the target to be re-labeled("MOLD"). Also, in 8.0 and earlier, key information was not available via the information_schema unless the current user owned the table being referenced. With most metadata being extracted from the information_schema, this yields a problem with versions that have that restriction. Also, of course, OUT parameter support came in 8.1.
In addition to PostgreSQL, merge_by_key includes support for Greenplum. While most of the functionality need not change for Greenplum support, table distribution needed to be specified in order to achieve a performat implementation. This was required to due the creation of the temporary table using the source query, as opposed to using a CREATE TEMP TABLE x (LIKE y) statement.
Copyright 2008, James William Pye. http://jwp.name
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
IN NO EVENT SHALL James William Pye BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF James William Pye HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
James William Pye SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND James William Pye HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.