Regex: 5 digits in increasing order

Wrong tool for the job. Just iterate through the characters one by one and check it. How you would do that depends on which language you're using.

Here is how to check using C:

#include <stdio.h>
#define CHR2INT(c) c - '0'

int main(void)
{
    char *str = "12345";
    int i, res = 1;

    for (i = 1; i < 5; ++i) {
        res &= CHR2INT(str[i - 1]) < CHR2INT(str[i]) && str[i] >= '0' && str[i] <= '9';
    }

    printf("%d", res);

    return 0;
}

It is obviously longer than a regex solution, but a regex solution will never be as fast as that.


polygenelubricants's suggestion is a great one, but there's a better one and that's to use a simpler lookahead constraint given that the bulk of the RE checks for the numeric-ness of the characters anyway. For why, see this log of an interactive Tcl session:

% set RE1 "^(?=\\d{5}$)1?2?3?4?5?6?7?8?9?0?$"
^(?=\d{5}$)1?2?3?4?5?6?7?8?9?0?$
% set RE2 "^(?=.{5}$)1?2?3?4?5?6?7?8?9?0?$"
^(?=.{5}$)1?2?3?4?5?6?7?8?9?0?$
% time {regexp $RE1 24579} 100000
32.80587355 microseconds per iteration
% time {regexp $RE2 24579} 100000
22.598555649999998 microseconds per iteration

As you can see, it's about 30% faster to use the version of the RE with .{5}$ as a lookahead constraint, at least in the Tcl RE engine. (Note that the above log misses some lines where I was stabilizing the compilations of the regular expressions, though I'd anticipate RE2 to be a little faster to compile anyway.) If you're using a different RE engine (e.g., PCRE or Perl) then you should recheck to get your own performance figures.


You can try (as seen on rubular.com)

^(?=\d{5}$)1?2?3?4?5?6?7?8?9?0?$

Explanation

  • ^ and $ are the beginning and end of string anchors respectively
  • \d{5} is the digit character class \d repeated exactly {5} times
  • (?=...) is a positive lookahead
  • ? on each digit makes each optional

How it works

  • First we use lookahead to assert that anchored at the beginning of the string, we can see \d{5} till the end of the string
  • Now that we know that we have 5 digits, we simply match the digits in the order we want, but making each digit optional
    • The assertion ensures that we have the correct number of digits

regular-expressions.info

  • Anchors, Character Classes, Finite Repetition, Lookarounds, and Optional

Generalizing the technique

Let's say that we need to match strings that consists of:

  • between 1-3 vowels [aeiou]
  • and the vowels must appear in order

Then the pattern is (as seen on rubular.com):

^(?=[aeiou]{1,3}$)a?e?i?o?u?$

Again, the way it works is that:

  • Anchored at the beginning of the string, we first assert (?=[aeiou]{1,3}$)
    • So correct alphabet in the string, and correct length
  • Then we test for each letter, in order, making each optional, until the end of the string

Allowing repetition

If each digit can repeat, e.g. 11223 is a match, then:

  • instead of ? (zero-or-one) on each digit,
  • we use * (zero-or-more repetition)

That is, the pattern is (as seen on rubular.com):

^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$

Tags:

Regex