Best way to test SQL queries
You wouldn't write an application with functions 200 lines long. You'd decompose those long functions into smaller functions, each with a single clearly defined responsibility.
Why write your SQL like that?
Decompose your queries, just like you decompose your functions. This makes them shorter, simpler, easier to comprehend, easier to test, easier to refactor. And it allows you to add "shims" between them, and "wrappers" around them, just as you do in procedural code.
How do you do this? By making each significant thing a query does into a view. Then you compose more complex queries out of these simpler views, just as you compose more complex functions out of more primitive functions.
And the great thing is, for most compositions of views, you'll get exactly the same performance out of your RDBMS. (For some you won't; so what? Premature optimization is the root of all evil. Code correctly first, then optimize if you need to.)
Here's an example of using several view to decompose a complicated query.
In the example, because each view adds only one transformation, each can be independently tested to find errors, and the tests are simple.
Here's the base table in the example:
create table month_value(
eid int not null, month int, year int, value int );
This table is flawed, because it uses two columns, month and year, to represent one datum, an absolute month. Here's our specification for the new, calculated column:
We'll do that as a linear transform, such that it sorts the same as (year, month), and such that for any (year, month) tuple there is one and only value, and all values are consecutive:
create view cm_absolute_month as
select *, year * 12 + month as absolute_month from month_value;
Now what we have to test is inherent in our spec, namely that for any tuple (year, month), there is one and only one (absolute_month), and that (absolute_month)s are consecutive. Let's write some tests.
Our test will be a SQL select
query, with the following structure: a test name and a case statement catenated together. The test name is just an arbitrary string. The case statement is just case when
test statements then 'passed' else 'failed' end
.
The test statements will just be SQL selects (subqueries) that must be true for the test to pass.
Here's our first test:
--a select statement that catenates the test name and the case statement
select concat(
-- the test name
'For every (year, month) there is one and only one (absolute_month): ',
-- the case statement
case when
-- one or more subqueries
-- in this case, an expected value and an actual value
-- that must be equal for the test to pass
( select count(distinct year, month) from month_value)
--expected value,
= ( select count(distinct absolute_month) from cm_absolute_month)
-- actual value
-- the then and else branches of the case statement
then 'passed' else 'failed' end
-- close the concat function and terminate the query
);
-- test result.
Running that query produces this result: For every (year, month) there is one and only one (absolute_month): passed
As long as there is sufficient test data in month_value, this test works.
We can add a test for sufficient test data, too:
select concat( 'Sufficient and sufficiently varied month_value test data: ',
case when
( select count(distinct year, month) from month_value) > 10
and ( select count(distinct year) from month_value) > 3
and ... more tests
then 'passed' else 'failed' end );
Now let's test it's consecutive:
select concat( '(absolute_month)s are consecutive: ',
case when ( select count(*) from cm_absolute_month a join cm_absolute_month b
on ( (a.month + 1 = b.month and a.year = b.year)
or (a.month = 12 and b.month = 1 and a.year + 1 = b.year) )
where a.absolute_month + 1 <> b.absolute_month ) = 0
then 'passed' else 'failed' end );
Now let's put our tests, which are just queries, into a file, and run the that script against the database. Indeed, if we store our view definitions in a script (or scripts, I recommend one file per related views) to be run against the database, we can add our tests for each view to the same script, so that the act of (re-) creating our view also runs the view's tests. That way, we both get regression tests when we re-create views, and, when the view creation runs against production, the view will will also be tested in production.
Create a test system database that you can reload as often as you wish. Load your data or create your data and save it off. Produce an easy way to reload it. Attach your development system to that database and validate your code before you go to production. Kick yourself everytime you manage to let an issue get into production. Create a suite of tests to verify known issues and grow your test suite over time.