What is the fastest way to convert many nullable ints 1:1 to a binary string?
On my machine (SQL Server 2017) the following C# SQLCLR function runs about 30% faster than the binary(5)
idea, 35% faster than CONCAT_WS
, and in half the time of the self-answer.
It requires UNSAFE
permission and uses pointers. The implementation is very specifically tied to the test data.
For testing purposes, the easiest way to get this unsafe assembly working is to set the database to TRUSTWORTHY
and disable the clr strict security configuration option if necessary.
Compiled code
For convenience the CREATE ASSEMBLY
compiled bits are at https://gist.github.com/SQLKiwi/72d01b661c74485900e7ebcfdc63ab8e
T-SQL Function Stub
CREATE FUNCTION dbo.NullableIntsToBinary
(
@Col01 int, @Col02 int, @Col03 int, @Col04 int, @Col05 int, @Col06 int, @Col07 int, @Col08 int,
@Col09 int, @Col10 int, @Col11 int, @Col12 int, @Col13 int, @Col14 int, @Col15 int, @Col16 int,
@Col17 int, @Col18 int, @Col19 int, @Col20 int, @Col21 int, @Col22 int, @Col23 int, @Col24 int,
@Col25 int, @Col26 int, @Col27 int, @Col28 int, @Col29 int, @Col30 int, @Col31 int, @Col32 int
)
RETURNS binary(132)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME Obbish.UserDefinedFunctions.NullableIntsToBinary;
Source code
The C# source is at https://gist.github.com/SQLKiwi/64f320fe7fd802a68a3a644aa8b8af9f
If you compile this for yourself, you must use a Class Library (.dll) as the target project type and check the Allow Unsafe Code build option.
Combined solution
Since you ultimately want to compute the SpookyHash of the binary data returned above, you can call SpookyHash within the CLR function and return the 16-byte hash.
An example implementation based on a table with a mixture of column data types is at https://gist.github.com/SQLKiwi/6f82582a4ad1920c372fac118ec82460. This includes an unsafe inlined version of the Spooky Hash algorithm derived from Jon Hanna's SpookilySharp and the original public domain C source code by Bob Jenkins.
An INT
column has four bytes of allowed values which exactly matches the size of a BINARY(4)
. In other words, every possible value of a BINARY(4) is matched to a possible value of an INT
column. So unless there's a value that isn't allowed in the INT
column, there is no safe replacement for a NULL. Whether or not a column is NULL has to be encoded separately. It simply cannot fit within a BINARY(4)
.
One way to do this is with a NULL bitmap. Consider the following code:
CAST(
CASE WHEN COL1 IS NOT NULL THEN 0 ELSE 1 END |
CASE WHEN COL2 IS NOT NULL THEN 0 ELSE 2 END |
CASE WHEN COL3 IS NOT NULL THEN 0 ELSE 4 END |
CASE WHEN COL4 IS NOT NULL THEN 0 ELSE 8 END |
CASE WHEN COL5 IS NOT NULL THEN 0 ELSE 16 END |
CASE WHEN COL6 IS NOT NULL THEN 0 ELSE 32 END |
CASE WHEN COL7 IS NOT NULL THEN 0 ELSE 64 END |
CASE WHEN COL8 IS NOT NULL THEN 0 ELSE 128 END
AS BINARY(1))
Whether or not eight columns are NULL fits in a single byte. These expressions can be compared between rows to check that all of the same columns are NULL or not NULL. With that additional information it then becomes safe to replace a NULL column value with anything that isn't NULL. I found CAST(ISNULL(COL1, 0) AS BINARY(4))
to be the fastest, although other variations such as ISNULL(CAST(COL1 AS VARBINARY(4)), 0x)
are possible.
It's hard to prove anything definitely, but I found the following details to be the fastest:
- Using 0 for NOT NULL in the bitmap since I know most column values won't be NULL
- Using bitwise or instead of addition for the bitmap
- Checking if the column value is NULL as opposed to the converted binary value
On my machine the benchmark takes about 27.5 CPU seconds. Unfortunately, the NULL bitmap step takes about one third of that time. It would be nice if there was a faster way to do this.
Here's the full solution:
SELECT
CAST(ISNULL(COL1, 0) AS BINARY(4)) +
CAST(ISNULL(COL2, 0) AS BINARY(4)) +
CAST(ISNULL(COL3, 0) AS BINARY(4)) +
CAST(ISNULL(COL4, 0) AS BINARY(4)) +
CAST(ISNULL(COL5, 0) AS BINARY(4)) +
CAST(ISNULL(COL6, 0) AS BINARY(4)) +
CAST(ISNULL(COL7, 0) AS BINARY(4)) +
CAST(ISNULL(COL8, 0) AS BINARY(4)) +
CAST(ISNULL(COL9, 0) AS BINARY(4)) +
CAST(ISNULL(COL10, 0) AS BINARY(4)) +
CAST(ISNULL(COL11, 0) AS BINARY(4)) +
CAST(ISNULL(COL12, 0) AS BINARY(4)) +
CAST(ISNULL(COL13, 0) AS BINARY(4)) +
CAST(ISNULL(COL14, 0) AS BINARY(4)) +
CAST(ISNULL(COL15, 0) AS BINARY(4)) +
CAST(ISNULL(COL16, 0) AS BINARY(4)) +
CAST(ISNULL(COL17, 0) AS BINARY(4)) +
CAST(ISNULL(COL18, 0) AS BINARY(4)) +
CAST(ISNULL(COL19, 0) AS BINARY(4)) +
CAST(ISNULL(COL20, 0) AS BINARY(4)) +
CAST(ISNULL(COL21, 0) AS BINARY(4)) +
CAST(ISNULL(COL22, 0) AS BINARY(4)) +
CAST(ISNULL(COL23, 0) AS BINARY(4)) +
CAST(ISNULL(COL24, 0) AS BINARY(4)) +
CAST(ISNULL(COL25, 0) AS BINARY(4)) +
CAST(ISNULL(COL26, 0) AS BINARY(4)) +
CAST(ISNULL(COL27, 0) AS BINARY(4)) +
CAST(ISNULL(COL28, 0) AS BINARY(4)) +
CAST(ISNULL(COL29, 0) AS BINARY(4)) +
CAST(ISNULL(COL30, 0) AS BINARY(4)) +
CAST(ISNULL(COL31, 0) AS BINARY(4)) +
CAST(ISNULL(COL32, 0) AS BINARY(4)) +
CAST(
CASE WHEN COL1 IS NOT NULL THEN 0 ELSE 1 END |
CASE WHEN COL2 IS NOT NULL THEN 0 ELSE 2 END |
CASE WHEN COL3 IS NOT NULL THEN 0 ELSE 4 END |
CASE WHEN COL4 IS NOT NULL THEN 0 ELSE 8 END |
CASE WHEN COL5 IS NOT NULL THEN 0 ELSE 16 END |
CASE WHEN COL6 IS NOT NULL THEN 0 ELSE 32 END |
CASE WHEN COL7 IS NOT NULL THEN 0 ELSE 64 END |
CASE WHEN COL8 IS NOT NULL THEN 0 ELSE 128 END
AS BINARY(1)) +
CAST(
CASE WHEN COL9 IS NOT NULL THEN 0 ELSE 1 END |
CASE WHEN COL10 IS NOT NULL THEN 0 ELSE 2 END |
CASE WHEN COL11 IS NOT NULL THEN 0 ELSE 4 END |
CASE WHEN COL12 IS NOT NULL THEN 0 ELSE 8 END |
CASE WHEN COL13 IS NOT NULL THEN 0 ELSE 16 END |
CASE WHEN COL14 IS NOT NULL THEN 0 ELSE 32 END |
CASE WHEN COL15 IS NOT NULL THEN 0 ELSE 64 END |
CASE WHEN COL16 IS NOT NULL THEN 0 ELSE 128 END
AS BINARY(1)) +
CAST(
CASE WHEN COL17 IS NOT NULL THEN 0 ELSE 1 END |
CASE WHEN COL18 IS NOT NULL THEN 0 ELSE 2 END |
CASE WHEN COL19 IS NOT NULL THEN 0 ELSE 4 END |
CASE WHEN COL20 IS NOT NULL THEN 0 ELSE 8 END |
CASE WHEN COL21 IS NOT NULL THEN 0 ELSE 16 END |
CASE WHEN COL22 IS NOT NULL THEN 0 ELSE 32 END |
CASE WHEN COL23 IS NOT NULL THEN 0 ELSE 64 END |
CASE WHEN COL24 IS NOT NULL THEN 0 ELSE 128 END
AS BINARY(1)) +
CAST(
CASE WHEN COL25 IS NOT NULL THEN 0 ELSE 1 END |
CASE WHEN COL26 IS NOT NULL THEN 0 ELSE 2 END |
CASE WHEN COL27 IS NOT NULL THEN 0 ELSE 4 END |
CASE WHEN COL28 IS NOT NULL THEN 0 ELSE 8 END |
CASE WHEN COL29 IS NOT NULL THEN 0 ELSE 16 END |
CASE WHEN COL30 IS NOT NULL THEN 0 ELSE 32 END |
CASE WHEN COL31 IS NOT NULL THEN 0 ELSE 64 END |
CASE WHEN COL32 IS NOT NULL THEN 0 ELSE 128 END
AS BINARY(1))
FROM dbo.TABLE_OF_32_INTS
OPTION (MAXDOP 1);
What about using BINARY(5)
and converting NULLs to something out of range for INTs:
SELECT @dummy =
ISNULL(CAST(COL1 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL2 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL3 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL4 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL5 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL6 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL7 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL8 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL9 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL10 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL11 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL12 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL13 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL14 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL15 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL16 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL17 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL18 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL19 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL20 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL21 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL22 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL23 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL24 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL25 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL26 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL27 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL28 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL29 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL30 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL31 AS BINARY(5)), 0x0100000000) +
ISNULL(CAST(COL32 AS BINARY(5)), 0x0100000000)
FROM dbo.TABLE_OF_32_INTS
OPTION (MAXDOP 1);