Retrieving Multiple Result-Sets from SQLAlchemy

SQLAlchemy is a great Python-based database client, but, traditionally, it leaves you stuck when it comes to stored-procedures that return more than one dataset. This means that you’d have to either call separate queries or merge multiple datasets into one large, unnatural one. However, there is a way to read multiple datasets but it requires accessing the raw MySQL layer (which isn’t too bad).

This is the test-routine:

delimiter //

CREATE PROCEDURE `get_sets`()
BEGIN
    SELECT
        'value1' `series1_col1`,
        'value2' `series1_col2`;

    SELECT
        'value3' `series2_col1`,
        'value4' `series2_col2`;

    SELECT
        'value5' `series3_col1`,
        'value6' `series3_col2`;
END//

delimiter ;

The code:

import json

import sqlalchemy.pool

def _run_query(connection, query, parameters={}):
    sets = []

    try:
        cursor = connection.cursor()

        cursor.execute(query, parameters)

        while 1:
            #(column_name, type_, ignore_, ignore_, ignore_, null_ok, column_flags)
            names = [c[0] for c in cursor.description]

            set_ = []
            while 1:
                row_raw = cursor.fetchone()
                if row_raw is None:
                    break

                row = dict(zip(names, row_raw))
                set_.append(row)

            sets.append(list(set_))

            if cursor.nextset() is None:
                break

            # nextset() doesn't seem to be sufficiant to tell the end.
            if cursor.description is None:
                break
    finally:
        # Return the connection to the pool (won't actually close).
        connection.close()

    return sets

def _pretty_json_dumps(data):
    return json.dumps(
            data,
            sort_keys=True,
            indent=4, 
            separators=(',', ': ')) + "\n"

def _main():
    dsn = 'mysql+mysqldb://root:root@localhost:3306/test_database'

    engine = sqlalchemy.create_engine(
                dsn, 
                pool_recycle=7200,
                poolclass=sqlalchemy.pool.NullPool)

    # Grab a raw connection from the connection-pool.
    connection = engine.raw_connection()

    query = 'CALL get_sets()'
    sets = _run_query(connection, query)

    print(_pretty_json_dumps(sets))

if __name__ == '__main__':
    _main()

The output:

[
    [
        {
            "series1_col1": "value1",
            "series1_col2": "value2"
        }
    ],
    [
        {
            "series2_col1": "value3",
            "series2_col2": "value4"
        }
    ],
    [
        {
            "series3_col1": "value5",
            "series3_col2": "value6"
        }
    ]
]

Things to observe in the example:

  • The query parameters are still escaped (our parameters have spaces in them), even though we have to use classic Python string-substitution formatting with the raw connection-objects.
  • It’s up to us to extract the column-names from the cursor for each dataset.
  • The resulting datasets can’t be captured as generators, as they have to be read entirely before jumping to the next dataset. Technically, you can yield each dataset, but this has almost no usefulness since you’d rarely be required need to read through them sequentially and you’d only benefit if there were a large number of datasets.
  • The raw_connection() method claims a connection from the pool, and its close() method will return it to the pool without actually closing it.
  • I added pool_recycle for good measure. This is an enormous pain to have to deal with, if you’re new to SA and your connections keep “going away” because MySQL is closing them before SA can recycle them.

REFERENCE: Multiple Result Sets

SQLAlchemy and MySQL Encoding

I recently ran into an issue with the encoding of data coming back from MySQL through sqlalchemy. This is the first time that I’ve encountered such issues since this project first came online, months ago.

I am using utf8 encoding on my database, tables, and columns. I just added a new column, and suddenly my pages and/or AJAX calls started failing with one of the following two messages, respectively:

  • UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x96 in position 5: ordinal not in range(128)
  • UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x96 in position 5: invalid start byte

When I tell the stored procedure to return an empty string for the new column instead of its data, it works. The other text columns have an identical encoding.

It turns out that SQLAlchemy defaults to the latin1 encoding. If you need something different, than you’re in for a surprise. The official solution is to pass the “encoding” parameter to create_engine. This is the example from the documentation:

engine = create_engine("mysql://scott:tiger@hostname/dbname", encoding='latin1', echo=True)

In my case, I tried utf8. However, it still didn’t work. I don’t know if that ever works. It wasn’t until I uncovered a StackOverflow entry that I found the answer. I had to append “?charset=utf8” to the DSN string:

mysql+mysqldb://username:password@hostname:port/database_name?charset=utf8

The following are the potential explanations:

  • Since I copy and pasted values that were set into these columns, I accidentally introduced a character that was out of range.
  • The two encodings have an overlapping set of codes, and I finally introduced a character that was supported by one but not the other.

Whatever the case, it’s fixed and I’m a few hours older.