Confusion between prepared statement and parameterized query in Python
First, your questions shows very good preparation - well done.
I am not sure, if I am the person to provide authoritative answer, but I will try to explain my understanding of the situation.
Prepared statement is an object, created on side of database server as a result of PREPARE
statement, turning provided SQL statement into sort of temporary procedure with parameters. Prepared
statement has lifetime of current database session and are discarded after the session is over.
SQL statement DEALOCATE
allows destroying the prepared statement explicitly.
Database clients can use SQL statement EXECUTE
to execute the prepared statement by calling it's
name and parameters.
Parametrized statement is alias for prepared statement as usually, the prepared statement has some parameters.
Parametrized query seems to be less often used alias for the same (24 mil Google hits for parametrized statement, 14 mil hits for parametrized query). It is possible, that some people use this term for another purpose.
Advantages of prepared statements are:
- faster execution of actual prepared statement call (not counting the time for
PREPARE
) - resistency to SQL injection attack
Players in executing SQL query
Real application will probably have following participants:
- application code
- ORM package (e.g. sqlalchemy)
- database driver
- database server
From application point of view it is not easy to know, if the code will really use prepared statement on database server or not as any of participants may lack support of prepared statements.
Conclusions
In application code prevent direct shaping of SQL query as it is prone to SQL injection attack. For this reason it is recommended using whatever the ORM provides to parametrized query even if it does not result on using prepared statements on database server side as the ORM code can be optimized to prevent this sort of attack.
Decide, if prepared statement is worth for performance reasons. If you have simple SQL query, which is executed only few times, it will not help, sometime it will even slow down the execution a bit.
For complex query being executed many times and having relatively short execution time will be the effect the biggest. In such a case, you may follow these steps:
- check, that the database you are going to use supports the
PREPARE
statement. In most cases it will be present. - check, that the drive you use is supporting prepared statements and if not, try to find another one supporting it.
- Check support of this feature on ORM package level. Sometime it vary driver by driver (e.g. sqlalchemy states some limitations on prepared statements with MySQL due to how MySQL manages that).
If you are in search for real authoritative answer, I would head to authors of sqlalchemy.
Prepared statement: A reference to a pre-interpreted query routine on the database, ready to accept parameters
Parametrized query: A query made by your code in such a way that you are passing values in alongside some SQL that has placeholder values, usually
?
or%s
or something of that flavor.
The confusion here seems to stem from the (apparent) lack of distinction between the ability to directly get a prepared statement object and the ability to pass values into a 'parametrized query' method that acts very much like one... because it is one, or at least makes one for you.
For example: the C interface of the SQLite3 library has a lot of tools for working with prepared statement objects, but the Python api makes almost no mention of them. You can't prepare a statement and use it multiple times whenever you want. Instead, you can use sqlite3.executemany(sql, params)
which takes the SQL code, creates a prepared statement internally, then uses that statement in a loop to process each of your parameter tuples in the iterable you gave.
Many other SQL libraries in Python behave the same way. Working with prepared statement objects can be a real pain, and can lead to ambiguity, and in a language like Python which has such a lean towards clarity and ease over raw execution speed they aren't really the greatest option. Essentially, if you find yourself having to make hundreds of thousands or millions of calls to a complex SQL query that gets re-interpreted every time, you should probably be doing things differently. Regardless, sometimes people wish they could have direct access to these objects because if you keep the same prepared statement around the database server won't have to keep interpreting the same SQL code over and over; most of the time this will be approaching the problem from the wrong direction and you will get much greater savings elsewhere or by restructuring your code.*
Perhaps more importantly in general is the way that prepared statements and parametrized queries keep your data sanitary and separate from your SQL code. This is vastly preferable to string formatting! You should think of parametrized queries and prepared statements, in one form or another, as the only way to pass variable data from your application into the database. If you try to build the SQL statement otherwise, it will not only run significantly slower but you will be vulnerable to other problems.
*e.g., by producing the data that is to be fed into the DB in a generator function then using executemany()
to insert it all at once from the generator, rather than calling execute()
each time you loop.
tl;dr
A parametrized query is a single operation which generates a prepared statement internally, then passes in your parameters and executes.
edit: A lot of people see this answer! I want to also clarify that many database engines also have concepts of a prepared statement that can be constructed explicitly with plaintext query syntax, then reused over the lifetime of a client's session (in postgres for example). Sometimes you have control over whether the query plan is cached to save even more time. Some frameworks use these automatically (I've seen rails' ORM do it aggressively), sometimes usefully and sometimes to their detriment when there are permutations of form for the queries being prepared.
Also if you want to nit pick, parametrized queries do not always use a prepared statement under the hood; they should do so if possible, but sometimes it's just formatting in the parameter values. The real difference between 'prepared statement' and 'parametrized query' here is really just the shape of the API you use.