Why Are Data Silos Important in Unit Tests?

I think the answer is quite simply one of consistency. It's also one of minimizing variations between environments we test in. This makes our test more consistent and repeatable. These are all things that are helpful should we need to diagnose or troubleshoot a problem during deployment.

When we generate our own data, that eliminates a variable between what happens in any sandbox vs a production environment. We KNOW we've used the same test data. We know what values we can expect to see at any point in our unit test.
We can create not only the database records, but also our User records. This allows us to control the context for running our unit tests with RunAs().
Similarly, we can create users who become the owners of the records in our test data. Doing this creates even more consistency and reduces additional variances between environments (development org to development org or vs production). Again, this allows us to eliminate another variable should we need to troubleshoot a deployment from one environment to another.
We don't always know that the kind of data we need to run our tests will exist in an org if we don't create it. This is especially true when testing in a sandbox, a fresh org, or one where records of the type our code uses have never been created. How an our test class possibly grab test data from the org if there are no records records that meet the criteria to test all of the conditions our code uses? Clearly we can't do that. It's incumbent on us to ensure the data we need is available which means we need to create valid data to test with.
Our code must also be bulk safe. In my view, testing using only one record per method doesn't constitute an adequate unit test. All it does is allow code to "pass the minimum requirements" needed to deploy it. The latter doesn't ensure our code will function reliably. That's what system.asserts, positive and negative test cases, bulk test cases along with other test methods are for. I strongly believe that our unit tests need to include those methods as well.
I want to add that what I'm speaking of doesn't prohibit a developer from creating randomness in their data. It still allows for that. The randomness is created in a controlled and consistent manner so the behavior of the code can be predicted and results still asserted. Validity of the data will be known provided the environment doesn't "do something" to it that's unexpected.
It's this consistency during creation that allows the diagnostics and troubleshooting of what's "different" between two environments when there's an issue with deployment or functionality of the code is not as expected.

This goes beyond the question that's been asked, but I'm a big believer in Test Driven Development. These help me test functionality as I build my code. When I'm finished, my test class is also completed and it's not an afterthought or chore. I like to create tests using lists of records where I can change a variable in my class that initially sets the list size to a length of 1. When I'm ready to bulk test, I'm then able to set it to a larger number, including raising it to 200 (or even more if appropriate), to ensure my code is bulk safe.

During deployment, I can lower that value to create 1 record/list to test with so time isn't wasted. But the method is there in my unit test and I KNOW that my data has been bulk tested and can easily be retested at any time in production. If I want to, the value of that list size can easily be controlled through a custom setting via a test environment utility/helper class. This allows me to retest in bulk at any point in time I want to. I'll add that I generally create a test utility class that creates all of my objects. This is great way to reuse my code.

So, my answer to the question is we do it so the developer of the code can achieve the maximum amount of control over the consistency of test conditions as he or she possibly can; regardless of the environment it runs in, each and every time the test method is executed. This is a discipline I was taught in engineering school for designing and creating tests. It was also expected when I was an engineer in the automotive industry (see this article by Charles Deming famous for "The Deming Way"). It only seems logical to me that it would apply to software as well since it applies to any other kind of testing, including tests of humans using software and electronics running it as well.

In two words: "bitter experience"

I will admit that if you don't have any asserts in your code (another topic) you run a lesser risk but, the reasons all have to do with repeatability/predictability and the advantages therein.

Every deployment of the same test class should run the same environment every time. Today, tomorrow, and 5 years from now. Here are some examples that have happened to me over the many years:

Your SOQL might return way more sobjects than you found in sandbox due to poorly constructed where clauses. Hence, you run into governor limits when deploying.
As a variation of #1, you might have an assert expecting n objects but the test returns 20n. Hence, deployment fails
As another variation of #1, you might have an assert fetch n objects but they don't match the n objects you mocked. The deployment fails. In fact, this happened to me today with some unmanaged package class I had been using that was at V16.0 (seeAllData=true was implicit back then). The deployment occurred when users were actively adding Accounts and an "interloper" appeared in the assertion error message.
You might be expecting custom settings or other sobjects used as configuration data to have the "right" values but there is no guarantee PROD will match sandbox. Hence, deployment fails.

So, for argument's sake, let's assume you are a good developer and strongly believe in system asserts to "prove" the correctness of your implementation with positive/negative/bulk tests. By the time you have done all this the last thing you want is a deployment error -- On the next deployment or on some deployment far in the future. You are under time pressure when you get to deployments, you may have limited deployment windows, and finding errors in deployments is more difficult than in sandbox.

And lastly, professional pride. You are leaving a legacy of software for some other developer/admin and why not treat them to repeatable/reliable code just as you would want to be treated.

Another point to add to cropredy's excellent list is that tests have significant value as a specification of what the code being tested does. But that only works if you can see the inputs (often the database rows) and expected outputs (the asserted values) in the test. Inserting exactly the data that is relevant to what is being tested in the test makes those inputs visible.

Somewhat related is that it is actually easier to write assertions where you have a code reference to the input. Though if you take advantage of @TestSetup to speed up the tests by setting the data up once, you lose the simple references to the data (as you have to query to get them in each test method).

Why Are Data Silos Important in Unit Tests?

Tags:

Apex

Data

Unit Test

Bestpractice

Seealldata

Related

Recent Posts