Why can't object names start with a number?
Firstly, you need to distinguish between numbers (numeric literals), strings (string literals) and identifiers. '4aii'
is a string literal, which can be a value of some "thing", but it does not identify (name) a thing. 4aii
or [4aii]
would be identifiers (if it were allowed).
The query parser needs to understand the meaning of a token that it is looking at. By allowing names to begin with digits, you by extension allow them to consist of digits exclusively. Then, given select 12345 from mytable
, how would you (and the parser) know if 12345
is an integer literal or a name of a column?
However, if you allow identifiers to begin with letters only (or underscore characters), you can unambiguously say if you're looking at an identifier (abc123
) or a string literal ('abc123'
) -- the latter is enclosed in quotation marks.
Square brackets in SQL Server, backticks (`) in MySQL, and double quotes in ANSI SQL-compliant engines, signify identifiers, and you use them for when your identifiers cannot be readily distinguished from other tokens: start with a digit, have spaces or other special characters in them etc. Hence, [4aii]
or "4aii"
clearly tell the parser that it's dealing with an identifier.
A little dbfiddle demo.
A string's a string, amirite?
Yes and No: a string is a string, but object / item names are not strings. So while that statement is true, it is also not relevant to the behavior that you are seeing.
Ignoring the conceptual reasoning for the specific rules, the technical answer to "why one works and not the other" is that SQL Server follows (with minimal customization), the Unicode Standard's guidelines for identifiers. The Unicode documentation can be found here:
Unicode® Standard Annex #31: UNICODE IDENTIFIER AND PATTERN SYNTAX
Identifiers that are not enclosed in either [...]
or "..."
are "regular" identifiers, while those that are enclosed are "delimited" identifiers. Regular identifiers are names that are valid in all contexts (i.e. these are the rules for naming things in this language, software, etc). Delimited identifiers are everything else: names that are not valid and should not work, however, they are given an exemption if you wrap them in either of those delimiters. Most identifiers can be delimited; it is only GOTO
labels and variables (including table variables) / parameters that cannot be delimited. The distinction seems to be that identifiers that exist purely for use in the T-SQL language (i.e. not a name that will ever be stored in a data file or log file as meta-data) cannot be delimited (much as you would expect in any language).
Now, the SQL Server documentation isn't exactly complete / correct, but it is correct about the classification of what is a valid "identifier" character (both starting and continuing) coming from the Unicode 3.2. If you want the actual list of rules for both regular and delimited identifiers, I have them documented here:
Completely Complete List of Rules for T-SQL Identifiers
To see the research that proves the relationship between the Unicode 3.2 categorizations and what SQL Server accepts for regular identifiers, please visit:
- The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers, Part 1
- The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers, Part 2
Addressing concerns noted in comments on this answer:
- Yes, even allowing for non-delimited identifiers to start with
_
,#
, and@
is accounted for in the Unicode spec. Section 1.2 addresses customizations to the base rules, and even provides four example customizations:_
,#
,@
, and$
. Those 4 four "potential" customizations are the exact same 4 that SQL Server uses. Hence, SQL Server allowing@Variable
and#TempTable
does not point away from this Unicode document as being the source of the rules. - As noted above, the SQL Server documentation does state that the categorizations used are from version 3.2 of the Unicode Character Database, and they are currently on version 10. You cannot use the current definitions of Ident_*, as found on the Unicode website, as indicating valid / invalid characters. Characters are added to
Ident_Start
andIdent_Continue
in each new version of the Unicode Standard. The only way to see the correct set of characters matching these properties is to download Unicode Version 3.2. - Both of the above two points are dealt with in the two blog posts noted directly above (named "The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers"). Please read those two posts before dismissing this answer as incorrect. There is a lot of nuance behind what is actually going on here which I address in those two posts, showing step-by-step how to match up the list of valid characters.
ALSO, with regards to the question as stated in the title, it depends on how loosely you define "number". Meaning, if you follow along with the research steps as shown in the two posts noted directly above, such that you have created a table to hold the Unicode Character Database v3.2 and a few additional properties, you can get a list of 52 non-letters (mostly "numbers") that are valid characters for starting an identifier via the following query:
SELECT ucd.*
FROM [v3-2].UnicodeCharacterDatabase ucd
WHERE ucd.[IDStart] = 1
AND ucd.[GeneralCategory] NOT LIKE 'L%';
Choosing a few of those characters to test, we can see that they do indeed work:
USE [tempdb];
CREATE TABLE dbo.Ⅳaii ([Col1] INT); -- ROMAN NUMERAL FOUR (U+2163)
CREATE TABLE dbo.ↂaii ([Col1] INT); -- ROMAN NUMERAL TEN THOUSAND (U+2182)
CREATE TABLE dbo.〤aii ([Col1] INT); -- HANGZHOU NUMERAL FOUR (U+3024)
And, just to show that they are "numbers" in more than just their names, the following query proves that they are assigned a numeric value (as shown in the NumericValue
column of the [v3-2].UnicodeCharacterDatabase
table:
SELECT 1 WHERE N'〤' LIKE N'[3-5]'; -- HANGZHOU NUMERAL FOUR (U+3024)
-- 1
However, they are not numbers that can be used in numeric operations:
SELECT 〤 + 0;
/*
Msg 207, Level 16, State 1, Line 23
Invalid column name '〤'.
*/
Regarding the issue of parsing and needing to be able to determine if 3e2
is a number or identifier: while this is a consideration, and possibly why numbers are excluded from the "Ident_start" Unicode general category, it is not a universal, and not necessarily why SQL Server excludes them. Three points to consider:
- While
3e2
by itself is ambiguous, if it were qualified with at least a schema name then it would not be:dbo.3e2
- The name
4aii
is not really ambiguous at all. Internal parsing would be able to identify that easily enough as not being a potential number MySQL / MariaDB do not have this restriction. They allow for non-delimited identifiers such as
4aii
and3e
, but not3e2
or300
. I was able to successfully execute the following in MySQL:create table 4aii (3e int);
So again, the reason that you cannot do this in SQL Server is because SQL Server adheres to the Unicode Standard's recommendation for identifiers. Why those characters were chosen by the Unicode Consortium is not specifically stated, but seems to be at least "best practice". Still, as proven with MySQL, it is possible to parse identifiers that start with a number.
What you're observing are the implementation's lexer rules. It's a part of a process called lexical analysis which is a fancy way of saying "making sense of things." Ideally, this would adhere to the rules given in the SQL Spec (<identifier>
). These rules are all published by Microsoft as the Rules for Regular Identifiers. If you wish to use irregular identifiers you have to quote them or "delimit" them from other tokens (Tsql's []
or double-quotes ""
) which eliminates any possibilities of ambiguous syntax.
A string's a string, amirite?
No, take for example this.
"No, take for example this."
That's one sentence. But, more importantly that's 5 words. You know it's five words because the whitespace is significant. You'll have to know it's five words if you're going to parse out the subjects, objects, and voice to make sense of it as an instruction.