Why doesn't [01-12] range work as expected?

You seem to have misunderstood how character classes definition works in regex.

To match any of the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, or 12, something like this works:

0[1-9]|1[0-2]

References

  • regular-expressions.info/Character Classes
    • Numeric Ranges (have many examples on matching strings interpreted as numeric ranges)

Explanation

A character class, by itself, attempts to match one and exactly one character from the input string. [01-12] actually defines [012], a character class that matches one character from the input against any of the 3 characters 0, 1, or 2.

The - range definition goes from 1 to 1, which includes just 1. On the other hand, something like [1-9] includes 1, 2, 3, 4, 5, 6, 7, 8, 9.

Beginners often make the mistakes of defining things like [this|that]. This doesn't "work". This character definition defines [this|a], i.e. it matches one character from the input against any of 6 characters in t, h, i, s, | or a. More than likely (this|that) is what is intended.

References

  • regular-expressions.info/Brackets for Grouping and Alternation with the vertical bar

How ranges are defined

So it's obvious now that a pattern like between [24-48] hours doesn't "work". The character class in this case is equivalent to [248].

That is, - in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g. a{3,5} matches between 3 and 5 a).

Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character 0 is encoded in ASCII as decimal 48; 9 is 57. Thus, the character definition [0-9] includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters 0, 1, ..., 9.

See also


Another example: A to Z

Let's take a look at another common character class definition [a-zA-Z]

In ASCII:

  • A = 65, Z = 90
  • a = 97, z = 122

This means that:

  • [a-zA-Z] and [A-Za-z] are equivalent
  • In most flavors, [a-Z] is likely to be an illegal character range
    • because a (97) is "greater than" than Z (90)
  • [A-z] is legal, but also includes these six characters:
    • [ (91), \ (92), ] (93), ^ (94), _ (95), ` (96)

Related questions

  • is the regex [a-Z] valid and if yes then is it the same as [a-zA-Z]

A character class in regular expressions, denoted by the [...] syntax, specifies the rules to match a single character in the input. As such, everything you write between the brackets specify how to match a single character.

Your pattern, [01-12] is thus broken down as follows:

  • 0 - match the single digit 0
  • or, 1-1, match a single digit in the range of 1 through 1
  • or, 2, match a single digit 2

So basically all you're matching is 0, 1 or 2.

In order to do the matching you want, matching two digits, ranging from 01-12 as numbers, you need to think about how they will look as text.

You have:

  • 01-09 (ie. first digit is 0, second digit is 1-9)
  • 10-12 (ie. first digit is 1, second digit is 0-2)

You will then have to write a regular expression for that, which can look like this:

  +-- a 0 followed by 1-9
  |
  |      +-- a 1 followed by 0-2
  |      |
<-+--> <-+-->
0[1-9]|1[0-2]
      ^
      |
      +-- vertical bar, this roughly means "OR" in this context

Note that trying to combine them in order to get a shorter expression will fail, by giving false positive matches for invalid input.

For instance, the pattern [0-1][0-9] would basically match the numbers 00-19, which is a bit more than what you want.

I tried finding a definite source for more information about character classes, but for now all I can give you is this Google Query for Regex Character Classes. Hopefully you'll be able to find some more information there to help you.


This also works:

^([1-9]|[0-1][0-2])$

[1-9] matches single digits between 1 and 9

[0-1][0-2] matches double digits between 10 and 12

There are some good examples here

Tags:

Regex