Why is mixing column collations in a single database considered bad?
The recommendation of keeping all column collations to the database default seems more like guidelines or best practices to me.
You are entirely correct here.
Why is it considered such a serious error by some?
For the same reason that you will often hear / read that "you should never use:"
- CURSORs
GOTO
statements- SQLCLR
WITH (NOLOCK)
- etc, etc, etc
Some features / options / technologies are more complicated than others and generally require more knowledge by the user because the chances of getting into trouble when using it are much greater than the chances of not having any problems. So, it is easier to have generalized rules against such things for the general population. In fact, when writing up "Coding Standards" at work, I will always have a rule to never use CURSORs, yet I use them myself because I know both "when" to use them and "how" to use them effectively. But folks who only occasionally write queries shouldn't be expected to know that. This is also similar to "don't edit the Registry unless you absolutely know what you are doing", or rules that we make as parents for our (very young) children where we need to tell them not to do something simply because they are not capable of traversing the complexities of when it is ok to do a particular thing or how to go about doing it.
In the case of Collations, this is a very complex and confusing topic, and you can run into both hard-errors (these are an issue but less of an issue since they are obvious and hence easy enough to fix) and into "odd" behavior where it is hard to explain why things are acting the way that they are (why some items are filtered, or not filtered, outside of expectations, OR why sorting is acting outside of expectations). And sadly, there seems to be a rather large amount of misinformation floating around which furthers the mass confusion. I am actually working on a project to greatly increase the general knowledge of Collations and encodings, etc and hopefully counteract the misinformation and myths, but not yet ready to release it (when done I will update this with a link to it).
For Collation, you need to use what makes the most sense for the business case. The notion of not mixing Collations in a table or database is a default approach, but if you look at the Collations used for the various columns of the system catalog views, you will notice a variety of Collations being used. So I agree with the main quote in the question that IF the Collations are going to be different, it should be intentional, but there is nothing inherently wrong with it.
Regarding this from the question (emphasis added):
While configuring the Octopus Deploy Server, the setup fails with a FATAL error during the initialization of the OctopusServer-instance. The article related to the error-message does not explain why this is a requirement
I checked the linked documentation page and it does indeed explain why it is a requirement. I have copied the pertinent info from that documentation below:
You must ensure you also change the collation of all objects in the Octopus Database, otherwise errors can occur when modifying the database during Octopus version upgrades. New objects created will use the updated collation, and when attempting to (for example) perform SQL joins between these and existing objects using the original collation, collation mis-match errors may occur.
They are saying that their code, in the Octopus database, has JOINs between string columns and could likely have new code introduced in a future upgrade that has additional JOINs on new string columns. New columns, either via CREATE TABLE
or ALTER TABLE ... ADD
, will be assigned the default Collation of the database if the COLLATE
keyword was not specified for the new string column(s). And JOINs between string columns that do not have the same Collation will generate a Collation mismatch error. They also seem to be allowing the user to choose their own Collation (possibly to accommodate different locales) since they say at the top that the only requirement is that the Collation be case-insensitive. And since the Collation of the database that their code lives in isn't guaranteed to always be the same, they can't use the COLLATE
keyword to force the same Collation across all new string columns (well, they technically can, but that requires Dynamic SQL so not easy to deal with when generating update scripts). If they were able to use the COLLATE
keyword, then they could get away with having the Database's default Collation be different than the string columns. That would avoid the hard "Collation mismatch" errors, but would still leave open the possibility of comparison operations involving one of those string columns and a string literal or variable resulting in "odd" behavior as it would use the column's Collation and not the Database's Collation. Of course, that could very well be expected behavior. But since this is a 3rd party app, the behavior should be what they intended rather than a 50 / 50 chance between a) what the user wanted (or didn't object to) and b) what the user considers a bug (and then wastes the vendor's support time on a wild goose chase and/or blogs about how their software is buggy).
On a short sentence: COLLATION defines sorting and comparing.
So, collation determines the rules SQL Server uses to compare and sort character data. These rules are language/locale aware and may also be sensitive to case, accent, Kana and width. Collation suffixes identify dictionary rule (in)sensitivity: _CS (case sensitive), _CI (case insensitive), _AS (accent sensitive), _AI (accent insensitive) and _KS (Kana sensitive). Binary collations, identified by suffixes _BIN (binary) and _BIN2 (binary-code point), are sensitive in all regards.
Different collations will certainly demand workarounds to avoid "cannot resolve collation conflict" errors and can kill performance due to the known non-sargable expressions. Dealing with different collations can be a nightmare (have been there) so that's why the recommendation to pick one and stick with it.
More references:
- Collation Hell
- Questions About SQL Server Collations You Were Too Shy to Ask
- What does collation mean?