Can SQL Server SQL_Latin1_General_CP1_CI_AS be safely converted to Latin1_General_CI_AS?
Here is a more complete answer:
https://www.olcot.co.uk/revised-difference-between-collation-sql_latin1_general_cp1_ci_as-and-latin1_general_ci_as/
The key difference between these collations is in how they apply character expansion rules. Certain Latin characters may be expanded into multiple characters. The SQL_xxxx collations may ignore these character expansions when working with non-unicode text, but apply them for unicode text. As a result: joins, sorts, and comparisons may return different results when using one collation versus the other.
Example:
Under Latin1_General_CI_AS
these two statements return the same set of records, as ß
is expanded to ss
.
SELECT * FROM MyTable3 WHERE Comments = 'strasse'
SELECT * FROM MyTable3 WHERE Comments = 'straße'
When using SQL_Latin1_General_CP1_CI_AS
the above statements return different records, since the ß
is treated as a different character than ss
.
If you are going to change the Collation of a Database, then there is definitely stuff you should know about so that you can plan accordingly:
Regarding data-loss potential:
NVARCHAR
fields are all Unicode, which is a single character set, so there can't be any data loss for these fields (this also covers XML fields which are also stored as UTF-16 Little Endian). Meta-data fields that store the object / column / index / etc names are allNVARCHAR
so no need to worry about those.VARCHAR
fields having different Collations but the same Code Page between the differing Collations will not be a problem since the Code Page is the character set.VARCHAR
fields having different Collations and moving to a different Code Page (when changing Collations) can have data loss if any of the characters being used are not represented in the new Code Page. HOWEVER, this is only an issue when physically changing the Collation of a particular field (described below) and would not happen upon changing the default Collation of a database.
Local variables and string literals get their Collation from the Database default. Changing the database default will change the Collation used for both local variables and string literals. But changing the Database's default Collation does not change the Collation used for existing string columns in the tables in that Database. This generally should not cause any problems when comparing or concatenating a column with a literal and/or variable since the literals and variables will take on the Collation of the column due to Collation Precedence. The only potential problem would be Code Page conversions that might occur for characters of values between 128 - 255 that are not available in the Code Page used by the Collation of the column.
If you are expecting a predicate / comparison / sort / concatenation / etc for a column to behave differently upon changing the Database's default Collation, then you will need to explicitly change that column's Collation using the following command:
ALTER TABLE [{table_name}] ALTER COLUMN [{column_name}] {same_datatype} {same_NULL_or_NOT NULL_setting} COLLATE {name_of_Database_default_Collation};
Be sure to specify the exact same datatype and
NULL
/NOT NULL
setting that are currently being used, else they can revert to the default if not already being the default value. After that, if there are any indexes on any of the string columns that just had their Collation changed, then you need to rebuild those indexes.Changing the Database's default Collation will change the Collation of certain database-specific meta-data, such as the
name
field in bothsys.objects
,sys.columns
,sys.indexes
, etc. Filtering these system Views against local variables or string literals won't be a problem since the Collation will be changing on both sides. But, if you JOIN any of the local system Views to temporary tables on string fields, and the Database-level Collation between the local database andtempdb
doesn't match, then you will get the "Collation mismatch" error. This is discussed below along with the remedy.One difference between these two Collations is in how they sort certain characters for
VARCHAR
data (this does not affectNVARCHAR
data). The non-EBCDICSQL_
Collations use what is called "String Sort" forVARCHAR
data, while all other Collations, and evenNVARCHAR
data for the non-EBCDICSQL_
Collations, use what is called "Word Sort". The difference is that in "Word Sort", the dash-
and apostrophe'
(and maybe a few other characters?) are given a very low weight and are essentially ignored unless there are no other differences in the strings. To see this behavior in action, run the following:DECLARE @Test TABLE (Col1 VARCHAR(10) NOT NULL); INSERT INTO @Test VALUES ('aa'); INSERT INTO @Test VALUES ('ac'); INSERT INTO @Test VALUES ('ah'); INSERT INTO @Test VALUES ('am'); INSERT INTO @Test VALUES ('aka'); INSERT INTO @Test VALUES ('akc'); INSERT INTO @Test VALUES ('ar'); INSERT INTO @Test VALUES ('a-f'); INSERT INTO @Test VALUES ('a_e'); INSERT INTO @Test VALUES ('a''kb'); SELECT * FROM @Test ORDER BY [Col1] COLLATE SQL_Latin1_General_CP1_CI_AS; -- "String Sort" puts all punctuation ahead of letters SELECT * FROM @Test ORDER BY [Col1] COLLATE Latin1_General_100_CI_AS; -- "Word Sort" mostly ignores dash and apostrophe
Returns:
String Sort ----------- a'kb a-f a_e aa ac ah aka akc am ar
and:
Word Sort --------- a_e aa ac a-f ah aka a'kb akc am ar
While you will "lose" the "String Sort" behavior, I'm not sure that I would call that a "feature". It is a behavior that has been deemed undesirable (as evidenced by the fact that it wasn't brought forward into any of the Windows collations). However, it is a definite difference of behavior between the two collations (again, just for non-EBCDIC
VARCHAR
data), and you might have code and/or customer expectations based upon the "String Sort" behavior. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.Another difference between
SQL_Latin1_General_CP1_CI_AS
andLatin1_General_100_CI_AS
is the ability to do Expansions onVARCHAR
data (NVARCHAR
data can already do these for mostSQL_
Collations), such as handlingæ
as if it wereae
:IF ('æ' COLLATE SQL_Latin1_General_CP1_CI_AS = 'ae' COLLATE SQL_Latin1_General_CP1_CI_AS) BEGIN PRINT 'SQL_Latin1_General_CP1_CI_AS'; END; IF ('æ' COLLATE Latin1_General_100_CI_AS = 'ae' COLLATE Latin1_General_100_CI_AS) BEGIN PRINT 'Latin1_General_100_CI_AS'; END;
Returns:
Latin1_General_100_CI_AS
The only thing you are "losing" here is not being able to do these expansions. Generally speaking, this is another benefit of moving to a Windows Collation. However, just like with the "String Sort" to "Word Sort" move, the same caution applies: it is a definite difference of behavior between the two collations (again, just for
VARCHAR
data), and you might have code and/or customer expectations based upon not having these mappings. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.(first noted in @Zarepheth's answer and expanded on here)
Another difference (that is also a benefit of moving to a Windows Collation) is that filtering a
VARCHAR
column that is indexed onNVARCHAR
literal / variable / column you will no longer invalidate the index on theVARCHAR
column. This is due to the Windows Collations using the same Unicode sorting and comparison rules for bothVARCHAR
andNVARCHAR
data. Because the sort order is the same between the two types, when theVARCHAR
data gets converted intoNVARCHAR
(explicitly or implicitly due to datatype precedence), the order of items in the index is still valid. For more details on this behavior, please see my post: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types.The server-level Collation is used to set the Collation of the system databases, which includes
[model]
. The[model]
database is used as a template to create new databases, which includes[tempdb]
upon each server startup. So, if the Database's default collation does not match the instance's default Collation and you join local tables to temporary tables on string fields, then you will get the Collation-mismatch error. Fortunately there is a somewhat easy way to correct for collation differences between the database that is "current" whenCREATE #TempTable
is executed and[tempdb]
. When creating temporary tables, declare a collation (on string columns) using theCOLLATE
clause and use either a specific collation (if you know that the DB will always be using that collation), orDATABASE_DEFAULT
(if you don't always know the collation of the DB where this code will execute):CREATE TABLE #Temp (Col1 NVARCHAR(40) COLLATE DATABASE_DEFAULT);
This is not necessary for table variables since they get their default Collation from the "current" database. However, if you have both table variables and temporary tables and join them on string fields, then you will need to use
COLLATE {specific_collation}
orCOLLATE DATABASE_DEFAULT
as shown directly above.The server-level collation also controls local variable names,
CURSOR
variable names, andGOTO
labels. While none of these would be impacted by the specific change being dealt with in this Question, it is at least something to be aware of.It is best to use the most recent version of the desired collation, if multiple versions are available. Starting in SQL Server 2005, a "90" series of collations was introduced, and SQL Server 2008 introduced a "100" series of collations. You can find these collations by using the following queries:
SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]90[_]%'; -- 476 SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]100[_]%'; -- 2686
ALSO, while the question asks about case-insensitive Collations, it should be noted that if someone else is looking to make a similar change but is using case-sensitive Collations, then another difference between SQL Server Collations and Windows Collations, for
VARCHAR
data only, is which case sorts first. Meaning, if you have bothA
anda
, theSQL_
Collations will sortA
beforea
, while the non-SQL_
Collations (and theSQL_
Collations when dealing withNVARCHAR
data) will sorta
beforeA
.
For a lot more info and details on changing the Collation of a Database or of the entire Instance, please see my post:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
For more info on working with strings and collations, please visit: Collations Info
There is more info on this MSDN forum:
http://social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/196b4586-1338-434d-ba8c-49fa3c9bdeeb/
Which states:
You should see little difference if the collation is SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, but both have instances where they are faster or slower than the other.
Latin1_General_CI_AS :- Latin1-General, case-insensitive, accent- sensitive, kanatype-insensitive, width-insensitive
SQL_Latin1_General_CP1_CI_AS:- Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data
Therefore in my opinion you shouldn't see a difference, especially if your data is only a-z0-9