Why is SQLAlchemy count() much slower than the raw query?

It took me a long time to find this as the solution to my problem. I was getting the following error:

sqlalchemy.exc.DatabaseError: (mysql.connector.errors.DatabaseError) 126 (HY000): Incorrect key file for table '/tmp/#sql_40ab_0.MYI'; try to repair it

The problem was resolved when I changed this:

query = session.query(rumorClass).filter(rumorClass.exchangeDataState == state)
return query.count()

to this:

query = session.query(func.count(rumorClass.id)).filter(rumorClass.exchangeDataState == state)
return query.scalar()

Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. The SQLAlchemy docs point out that the "optimized" query can be achieved using query(func.count(Segment.id)):

Return a count of rows this Query would return.

This generates the SQL for this Query as follows:

SELECT count(1) AS count_1 FROM (
     SELECT <rest of query follows...> ) AS anon_1

For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:

from sqlalchemy import func

# count User records, without
# using a subquery.
session.query(func.count(User.id))

# return count of user "id" grouped
# by "name"
session.query(func.count(User.id)).\
        group_by(User.name)

from sqlalchemy import distinct

# count distinct "name" values
session.query(func.count(distinct(User.name)))

The reason is that SQLAlchemy's count() is counting the results of a subquery which is still doing the full amount of work to retrieve the rows you are counting. This behavior is agnostic of the underlying database; it isn't a problem with MySQL.

The SQLAlchemy docs explain how to issue a count without a subquery by importing func from sqlalchemy.

session.query(func.count(User.id)).scalar()

>>>SELECT count(users.id) AS count_1 \nFROM users')