MySQL strip non-numeric characters to compare

Updated 2022: as per Marlom's answer, you can now use REGEX_REPLACE - which will perform even better than my historical answer here.


Most upvoted answer above isn't the fastest.
Full kudos to them for giving a working proposal to bounce off!

This is an improved version:

DELIMITER ;;
DROP FUNCTION IF EXISTS `STRIP_NON_DIGIT`;;

CREATE DEFINER=`root`@`localhost` FUNCTION `STRIP_NON_DIGIT`(input VARCHAR(255)) RETURNS VARCHAR(255) CHARSET utf8
READS SQL DATA
BEGIN
   DECLARE output    VARCHAR(255) DEFAULT '';
   DECLARE iterator  INT          DEFAULT 1;
   DECLARE lastDigit INT          DEFAULT 1;
   DECLARE len       INT;
   
   SET len = LENGTH(input) + 1;
   WHILE iterator < len DO
      -- skip past all digits
      SET lastDigit = iterator;
      WHILE ORD(SUBSTRING(input, iterator, 1)) BETWEEN 48 AND 57 AND iterator < len DO
         SET iterator = iterator + 1;
      END WHILE;

      IF iterator != lastDigit THEN
         SET output = CONCAT(output, SUBSTRING(input, lastDigit, iterator - lastDigit));
      END IF;

      WHILE ORD(SUBSTRING(input, iterator, 1)) NOT BETWEEN 48 AND 57 AND iterator < len DO
         SET iterator = iterator + 1;
      END WHILE;
   END WHILE;
   
   RETURN output;
END;;

Testing 5000 times on a test server:

-- original
Execution Time : 7.389 sec
Execution Time : 7.257 sec
Execution Time : 7.506 sec

-- ORD between not string IN
Execution Time : 4.031 sec

-- With less substrings
Execution Time : 3.243 sec
Execution Time : 3.415 sec
Execution Time : 2.848 sec

I realise that this is an ancient topic but upon googling this problem I couldn't find a simple solution (I saw the venerable agents but think this is a simpler solution) so here's a function I wrote, seems to work quite well.

DROP FUNCTION IF EXISTS STRIP_NON_DIGIT;
DELIMITER $$
CREATE FUNCTION STRIP_NON_DIGIT(input VARCHAR(255))
   RETURNS VARCHAR(255)
BEGIN
   DECLARE output   VARCHAR(255) DEFAULT '';
   DECLARE iterator INT          DEFAULT 1;
   WHILE iterator < (LENGTH(input) + 1) DO
      IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
         SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
      END IF;
      SET iterator = iterator + 1;
   END WHILE;
   RETURN output;
END
$$

While it's not pretty and it shows results that don't match, this helps:

SELECT * FROM foo WHERE bar LIKE = '%1%2%3%4%5%'

I would still like to find a better solution similar to the item in the original question.


You can easily do what you want with REGEXP_REPLACE (compatible with MySQL 8+ and MariaDB 10.0.5+)

REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])

Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.

Go to REGEXP_REPLACE doc: MySQL or MariaDB

Try it:

SELECT REGEXP_REPLACE('123asd12333', '[a-zA-Z]+', '');

Output:

12312333

Tags:

Mysql

Regex