Trim whitespace (spaces, tabs, newlines)
For anyone using SQL Server 2017 or newer
you can use the TRIM built-in function. For example:
DECLARE @Test NVARCHAR(4000);
SET @Test = N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + N' this
' + NCHAR(0x09) + NCHAR(0x09) + N' content' + NCHAR(0x09) + NCHAR(0x09) + N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N' ';
SELECT N'~'
+ TRIM(NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A) FROM @Test)
+ N'~';
Please note that the default behavior of TRIM
is to remove only spaces, so in order to also remove the tabs and newlines (CR + LFs), you need to specify the characters FROM
clause.
Also, I used NCHAR(0x09)
for the tab characters in the @Test
variable so that the example code can be copied-and-pasted and retain the correct characters. Otherwise, tabs get converted into spaces when this page is rendered.
For anyone using SQL Server 2016 or older
You can create a function, either as a SQLCLR Scalar UDF or a T-SQL Inline TVF (iTVF). The T-SQL Inline TVF would be as follows:
CREATE
--ALTER
FUNCTION dbo.TrimChars(@OriginalString NVARCHAR(4000), @CharsToTrim NVARCHAR(50))
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
WITH cte AS
(
SELECT PATINDEX(N'%[^' + @CharsToTrim + N']%', @OriginalString) AS [FirstChar],
PATINDEX(N'%[^' + @CharsToTrim + N']%', REVERSE(@OriginalString)) AS [LastChar],
LEN(@OriginalString + N'~') - 1 AS [ActualLength]
)
SELECT cte.[ActualLength],
[FirstChar],
((cte.[ActualLength] - [LastChar]) + 1) AS [LastChar],
SUBSTRING(@OriginalString, [FirstChar],
((cte.[ActualLength] - [LastChar]) - [FirstChar] + 2)) AS [FixedString]
FROM cte;
GO
And running it as follows:
DECLARE @Test NVARCHAR(4000);
SET @Test = N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + N' this
' + NCHAR(0x09) + NCHAR(0x09) + N' content' + NCHAR(0x09) + NCHAR(0x09) + N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N' ';
SELECT N'~' + tc.[FixedString] + N'~' AS [proof]
FROM dbo.TrimChars(@Test, NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A)) tc;
Returns:
proof
----
~this
content~
And you can use that in an UPDATE
using CROSS APPLY
:
UPDATE tbl
SET tbl.[Column] = itvf.[FixedString]
FROM SchemaName.TableName tbl
CROSS APPLY dbo.TrimChars(tbl.[Column],
NCHAR(0x09) + NCHAR(0x20) + NCHAR(0x0D) + NCHAR(0x0A)) itvf
As mentioned at the beginning, this is also really easy via SQLCLR since .NET includes a Trim()
method that does exactly the operation you are wanting. You can either code your own to call SqlString.Value.Trim()
, or you can just install the Free version of the SQL# library (which I created, but this function is in the Free version) and use either String_Trim (which does just white space) or String_TrimChars where you pass in the characters to trim from both sides (just like the iTVF shown above).
DECLARE @Test NVARCHAR(4000);
SET @Test = N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + N' this
' + NCHAR(0x09) + NCHAR(0x09) + N' content' + NCHAR(0x09) + NCHAR(0x09) + N'
' + NCHAR(0x09) + N' ' + NCHAR(0x09) + NCHAR(0x09) + N' ';
SELECT N'~' + SQL#.String_Trim(@Test) + N'~' AS [proof];
And it returns the exact same string as shown above in the iTVF example output. But being a scalar UDF, you would use it as follows in an UPDATE
:
UPDATE tbl
SET tbl.[Column] = SQL#.String_Trim(itvf.[Column])
FROM SchemaName.TableName tbl
Either one of the above should be efficient for using across millions of rows. Inline TVFs are optimizible unlike Multi-statement TVFs and T-SQL scalar UDFs. And, SQLCLR Scalar UDFs have the potential to be used in parallel plans, as long as they are marked as IsDeterministic=true
and do not set either type of DataAccess to Read
(the default for both User and System data access is None
), and both of those conditions are true for both of the SQLCLR functions noted above.
You might want to consider using a TVF (table-valued-function) to remove the offending characters from the start and end of your data.
Create a table to hold test data:
IF COALESCE(OBJECT_ID('dbo.TrimTest'), 0) <> 0
BEGIN
DROP TABLE dbo.TrimTest;
END
CREATE TABLE dbo.TrimTest
(
SampleData VARCHAR(50) NOT NULL
);
INSERT INTO dbo.TrimTest (SampleData)
SELECT CHAR(13) + CHAR(10) + CHAR(9) + 'this is ' + CHAR(13) + CHAR(10) + ' a test' + CHAR(13) + CHAR(10);
GO
Create the TVF:
IF COALESCE(OBJECT_ID('dbo.StripCrLfTab'), 0) <> 0
BEGIN
DROP FUNCTION dbo.StripCrLfTab;
END
GO
CREATE FUNCTION dbo.StripCrLfTab
(
@val NVARCHAR(1000)
)
RETURNS @Results TABLE
(
TrimmedVal NVARCHAR(1000) NULL
)
AS
BEGIN
DECLARE @TrimmedVal NVARCHAR(1000);
SET @TrimmedVal = CASE WHEN RIGHT(@val, 1) = CHAR(13) OR RIGHT(@val, 1) = CHAR(10) OR RIGHT(@val, 1) = CHAR(9)
THEN LEFT(
CASE WHEN LEFT(@val, 1) = CHAR(13) OR LEFT(@val, 1) = CHAR(10) OR LEFT(@val, 1) = CHAR(9)
THEN RIGHT(@val, LEN(@val) - 1)
ELSE @val
END
, LEN(@val) -1 )
ELSE
CASE WHEN LEFT(@val, 1) = CHAR(13) OR LEFT(@val, 1) = CHAR(10) OR LEFT(@val, 1) = CHAR(9)
THEN RIGHT(@val, LEN(@val) - 1)
ELSE @val
END
END;
IF @TrimmedVal LIKE (CHAR(13) + '%')
OR @TrimmedVal LIKE (CHAR(10) + '%')
OR @TrimmedVal LIKE (CHAR(9) + '%')
OR @TrimmedVal LIKE ('%' + CHAR(13))
OR @TrimmedVal LIKE ('%' + CHAR(10))
OR @TrimmedVal LIKE ('%' + CHAR(9))
SELECT @TrimmedVal = tv.TrimmedVal
FROM dbo.StripCrLfTab(@TrimmedVal) tv;
INSERT INTO @Results (TrimmedVal)
VALUES (@TrimmedVal);
RETURN;
END;
GO
Run the TVF to show the results:
SELECT tt.SampleData
, stt.TrimmedVal
FROM dbo.TrimTest tt
CROSS APPLY dbo.StripCrLfTab(tt.SampleData) stt;
Results:
The TVF calls itself recursively until there are no remaining offending characters at the start and end of the string passed into the function. This is unlikely to perform well over a large number of rows, but would probably work ok if you are using this to fix data as it is inserted into the database.
You could use this in an update statement:
UPDATE dbo.TrimTest
SET TrimTest.SampleData = stt.TrimmedVal
FROM dbo.TrimTest tt
CROSS APPLY dbo.StripCrLfTab(tt.SampleData) stt;
SELECT *
FROM dbo.TrimTest;
Results (as text):