Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS's?

NOTE that this is a general look at stored procedures not regulated to a specific DBMS. Some DBMS (and even, different versions of the same DBMS!) may operate contrary to this, so you'll want to double-check with your target DBMS before assuming all of this still holds.

I've been a Sybase ASE, MySQL, and SQL Server DBA on-and off since for almost a decade (along with application development in C, PHP, PL/SQL, C#.NET, and Ruby). So, I have no particular axe to grind in this (sometimes) holy war.

The historical performance benefit of stored procs have generally been from the following (in no particular order):

  • Pre-parsed SQL
  • Pre-generated query execution plan
  • Reduced network latency
  • Potential cache benefits

Pre-parsed SQL -- similar benefits to compiled vs. interpreted code, except on a very micro level.

Still an advantage? Not very noticeable at all on the modern CPU, but if you are sending a single SQL statement that is VERY large eleventy-billion times a second, the parsing overhead can add up.

Pre-generated query execution plan. If you have many JOINs the permutations can grow quite unmanageable (modern optimizers have limits and cut-offs for performance reasons). It is not unknown for very complicated SQL to have distinct, measurable (I've seen a complicated query take 10+ seconds just to generate a plan, before we tweaked the DBMS) latencies due to the optimizer trying to figure out the "near best" execution plan. Stored procedures will, generally, store this in memory so you can avoid this overhead.

Still an advantage? Most DBMS' (the latest editions) will cache the query plans for INDIVIDUAL SQL statements, greatly reducing the performance differential between stored procs and ad hoc SQL. There are some caveats and cases in which this isn't the case, so you'll need to test on your target DBMS.

Also, more and more DBMS allow you to provide optimizer path plans (abstract query plans) to significantly reduce optimization time (for both ad hoc and stored procedure SQL!!).

WARNING Cached query plans are not a performance panacea. Occasionally the query plan that is generated is sub-optimal. For example, if you send SELECT * FROM table WHERE id BETWEEN 1 AND 99999999, the DBMS may select a full-table scan instead of an index scan because you're grabbing every row in the table (so sayeth the statistics). If this is the cached version, then you can get poor performance when you later send SELECT * FROM table WHERE id BETWEEN 1 AND 2. The reasoning behind this is outside the scope of this posting, but for further reading see: http://www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspx and http://msdn.microsoft.com/en-us/library/ms181055.aspx and http://www.simple-talk.com/sql/performance/execution-plan-basics/

"In summary, they determined that supplying anything other than the common values when a compile or recompile was performed resulted in the optimizer compiling and caching the query plan for that particular value. Yet, when that query plan was reused for subsequent executions of the same query for the common values (‘M’, ‘R’, or ‘T’), it resulted in sub-optimal performance. This sub-optimal performance problem existed until the query was recompiled. At that point, based on the @P1 parameter value supplied, the query might or might not have a performance problem."

Reduced network latency A) If you are running the same SQL over and over -- and the SQL adds up to many KB of code -- replacing that with a simple "exec foobar" can really add up. B) Stored procs can be used to move procedural code into the DBMS. This saves shuffling large amounts of data off to the client only to have it send a trickle of info back (or none at all!). Analogous to doing a JOIN in the DBMS vs. in your code (everyone's favorite WTF!)

Still an advantage? A) Modern 1Gb (and 10Gb and up!) Ethernet really make this negligible. B) Depends on how saturated your network is -- why shove several megabytes of data back and forth for no good reason?

Potential cache benefits Performing server-side transforms of data can potentially be faster if you have sufficient memory on the DBMS and the data you need is in memory of the server.

Still an advantage? Unless your app has shared memory access to DBMS data, the edge will always be to stored procs.

Of course, no discussion of Stored Procedure optimization would be complete without a discussion of parameterized and ad hoc SQL.

Parameterized / Prepared SQL
Kind of a cross between stored procedures and ad hoc SQL, they are embedded SQL statements in a host language that uses "parameters" for query values, e.g.:

SELECT .. FROM yourtable WHERE foo = ? AND bar = ?

These provide a more generalized version of a query that modern-day optimizers can use to cache (and re-use) the query execution plan, resulting in much of the performance benefit of stored procedures.

Ad Hoc SQL Just open a console window to your DBMS and type in a SQL statement. In the past, these were the "worst" performers (on average) since the DBMS had no way of pre-optimizing the queries as in the parameterized/stored proc method.

Still a disadvantage? Not necessarily. Most DBMS have the ability to "abstract" ad hoc SQL into parameterized versions -- thus more or less negating the difference between the two. Some do this implicitly or must be enabled with a command setting (SQL server: http://msdn.microsoft.com/en-us/library/ms175037.aspx , Oracle: http://www.praetoriate.com/oracle_tips_cursor_sharing.htm).

Lessons learned? Moore's law continues to march on and DBMS optimizers, with every release, get more sophisticated. Sure, you can place every single silly teeny SQL statement inside a stored proc, but just know that the programmers working on optimizers are very smart and are continually looking for ways to improve performance. Eventually (if it's not here already) ad hoc SQL performance will become indistinguishable (on average!) from stored procedure performance, so any sort of massive stored procedure use ** solely for "performance reasons"** sure sounds like premature optimization to me.

Anyway, I think if you avoid the edge cases and have fairly vanilla SQL, you won't notice a difference between ad hoc and stored procedures.


Reasons for using stored procedures:

  • Reduce network traffic -- you have to send the SQL statement across the network. With sprocs, you can execute SQL in batches, which is also more efficient.
  • Caching query plan -- the first time the sproc is executed, SQL Server creates an execution plan, which is cached for reuse. This is particularly performant for small queries run frequently.
  • Ability to use output parameters -- if you send inline SQL that returns one row, you can only get back a recordset. With sprocs you can get them back as output parameters, which is considerably faster.
  • Permissions -- when you send inline SQL, you have to grant permissions on the table(s) to the user, which is granting much more access than merely granting permission to execute a sproc
  • Separation of logic -- remove the SQL-generating code and segregate it in the database.
  • Ability to edit without recompiling -- this can be controversial. You can edit the SQL in a sproc without having to recompile the application.
  • Find where a table is used -- with sprocs, if you want to find all SQL statements referencing a particular table, you can export the sproc code and search it. This is much easier than trying to find it in code.
  • Optimization -- It's easier for a DBA to optimize the SQL and tune the database when sprocs are used. It's easier to find missing indexes and such.
  • SQL injection attacks -- properly written inline SQL can defend against attacks, but sprocs are better for this protection.

In many cases, stored procedures are actually slower because they're more genaralized. While stored procedures can be highly tuned, in my experience there's enough development and institutional friction that they're left in place once they work, so stored procedures often tend to return a lot of columns "just in case" - because you don't want to deploy a new stored procedure every time you change your application. An OR/M, on the other hand, only requests the columns the application is using, which cuts down on network traffic, unnecessary joins, etc.