Why is the regex to match 1 to 10 written as [1-9]|10 and not [1-10]?

Sometime a good drawing worth 1000 words...

Here are the three propositions in your question and the way a regex flavour would understand them:

[1-9]|10

Regular expression image

[1-10]

Regular expression image

[1-(10)]

Invalid regexp !!

This regex is invalid because a range is opened (1-) with a digit but not closed with another digit (ends with ().

A range is usually bound with digits on both sides or letters on both sides.

Images generated with Debuggex


That is because regexes work with characters, not with numbers. [1-9] is equivalent to (?:1|2|3|4|5|6|7|8|9) while [1-10] would be (?:1|0) (because it's the range 1–1 and the digit 0).

Simply put, ranges in character classes always refer to contiguous ranges of characters, despite how they look. Even if they're digits that doesn't mean there is any kind of numeric range.


[1-9]|10

In this:

  • [1-9] accepts any character from 1 through 9;
  • | performs an "or" operation;
  • 10 accepts the 10 literally.

[1-10]

This accepts:

  • any single character between 1 and 1,
  • or 0.

Tags:

Regex