An upsert (UPdate or inSERT) SQL query does either one of the two following things. If the record in question does not exist yet (as identified by some primary key or other uniqueness constraint), the row gets inserted. Otherwise, the entry gets updated.

This behavior is often quite handy because it simplifies some boilerplate code. The user interface could present some functionality to add (insert) some entity and to modify (update) it. For the developer, both processes can be captured by the same functionality.

In MySQL, such an upsert query can be formed using the either REPLACE ... or INSERT ... ON DUPLICATE KEY UPDATE commands. The difference being that REPLACE will actually delete an existing entry and add a new one with the same/updated data while the INSERT ... ON DUPLICATE KEY UPDATE command will actually update the existing row. The end result is the same except for auto_increment values which get incremented by REPLACE. You cannot use REPLACE if you want to reference an existing column value: REPLACE INTO table (counter_col) VALUES (counter_col + 1) would yield NULL. I’ll go with INSERT INTO ... ON DUPLICATE KEY UPDATE because it is more powerful in my opinion.

In a simple CRUD application where users modify content as pictured above you will find yourself writing a lot of similar database queries like INSERT INTO `X`, UPSERT `X`, INSERT INTO `Y`, UPSERT `Y`, etc. INSERT and UPDATE can be combined into a single query as I said. But: We can go further by automacilly generating the query for us. The thing is, given a set of key-value pairs, we know how the query would look like - no matter the table columns. We always talk about INSERT INTO `tablename` VALUES(col1, col2, col3) (value1, value2, value3) ON DUPLICATE KEY UPDATE col1 = value`, col2 = value2, col3 = value3. This can be trivially automated.

Before diving into the implementation, let’s make ourselves us a sample scenario:

CREATE TABLE `planets` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `orbital_speed__km_s` float NOT NULL,
  `satellites` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) AUTO_INCREMENT=1;

And some content:

mysql> SELECT * FROM planets;
+----+---------+---------------------+------------+
| id | name    | orbital_speed__km_s | satellites |
+----+---------+---------------------+------------+
|  1 | Saturn  |                9.69 |         62 |
|  2 | Jupiter |               13.07 |         67 |
+----+---------+---------------------+------------+
2 rows in set (0.00 sec)

We have some astronomy app which contains information about our solar system. All we ever would want to do is inserting or updating a planet. Here is the little upsert query code which will help us along the way.

def get_upsert_query(table, data):
    """
    Convert the provided dictionary to a 'INSERT INTO ... ON DUPLICATE KEY
    UPDATE' query into the given table.
    """
    columns = []
    values = []
    for k, v in data.iteritems():
        columns.append(k)
        if (v is None):
            values.append('NULL')
        else:
            values.append('"{}"'.format(v))
    query = ''
    query += 'INSERT INTO `{}`'.format(table)
    query += '('
    query += ', '.join(columns)
    query += ') VALUES ('
    query += ', '.join(values)
    query += ')'
    query += ' ON DUPLICATE KEY UPDATE '
    query += ', '.join(['{} = VALUES({})'.format(x, x) for x in columns])
    query += ';'
    return query

Let’s say we want to use the above code to upsert a planet. Here is some data we could use:

>>> table = "planets" 
>>> data = {'name': 'Pluto', 'orbital_speed__km_s': 4.67, 'satellites': 5}

Feeding this into get_upsert_query() yields:

INSERT INTO `planets`(satellites, orbital_speed__km_s, name) VALUES ("5", "4.67", "Pluto") ON DUPLICATE KEY UPDATE satellites = VALUES(satellites), orbital_speed__km_s = VALUES(orbital_speed__km_s), name = VALUES(name);

Note too bad, he? As you can see get_upsert_query() is completely agnostic about the actual database structure. This is where it comes handy. You can use it for all your tables! Just throw the table name & some key-value pairs at it and you’re done.

Oh, and updating existing rows is as easy as:

>>> data = {'name': 'Pluto', 'satellites': 6}  # discovered a new moon!
>>> get_upsert_query(table, data)
'INSERT INTO `planets`(satellites, name) VALUES ("6", "Pluto") ON DUPLICATE KEY UPDATE satellites = VALUES(satellites), name = VALUES(name);'

Given the two sample queries, this is what happens in the database:

mysql> SELECT * FROM planets;
+----+---------+---------------------+------------+
| id | name    | orbital_speed__km_s | satellites |
+----+---------+---------------------+------------+
|  1 | Saturn  |                9.69 |         62 |
|  2 | Jupiter |               13.07 |         67 |
+----+---------+---------------------+------------+
2 rows in set (0.00 sec)

mysql> INSERT INTO `planets`(satellites, orbital_speed__km_s, name) VALUES ("5", "4.67", "Pluto") ON DUPLICATE KEY UPDATE satellites = VALUES(satellites), orbital_speed__km_s = VALUES(orbital_speed__km_s), name = VALUES(name);
Query OK, 1 row affected (0.06 sec)

mysql> SELECT * FROM planets;
+----+---------+---------------------+------------+
| id | name    | orbital_speed__km_s | satellites |
+----+---------+---------------------+------------+
|  1 | Saturn  |                9.69 |         62 |
|  2 | Jupiter |               13.07 |         67 |
|  3 | Pluto   |                4.67 |          5 |
+----+---------+---------------------+------------+
3 rows in set (0.00 sec)

mysql> INSERT INTO `planets`(satellites, name) VALUES ("6", "Pluto") ON DUPLICATE KEY UPDATE satellites = VALUES(satellites), name = VALUES(name);
Query OK, 2 rows affected (0.08 sec)

# MySQL says "1 row affected" if the query did an insert, and "2 rows
# affected" for an update.

mysql> SELECT * FROM planets;
+----+---------+---------------------+------------+
| id | name    | orbital_speed__km_s | satellites |
+----+---------+---------------------+------------+
|  1 | Saturn  |                9.69 |         62 |
|  2 | Jupiter |               13.07 |         67 |
|  3 | Pluto   |                4.67 |          6 |
+----+---------+---------------------+------------+
3 rows in set (0.00 sec)

We have used the nearly identical queries in order to insert or update the planet Pluto.

The get_upsert_query() function is a good fit for a small project where a specific database abstraction layer would be overkill but writing every query by hand time consuming. It’s especially useful when the database design is not clear in the beginning and you don’t want to go ahead and change your code whenever your realized some column is missing (of course, we always have the requirements in advance, don’t we?)

Some further stuff:

  • Note how we only need the column names for the query’s UPDATE part. You can use colname = VALUE(colname) to tell MySQL to use the value that was already transmitted in the INSERT part.
  • You obviously need to provide the primary key or the any unique key in the dict in order to correctly update an existing row.
  • The function could me made even more elaborate. For example, you could remove from the dict any key that does not correspond to a column in the table. However, at this point the app might already be big enough for some db abstraction layer.