Removing non numeric characters from a string
Loop over your string, char by char and only include digits:
new_string = ''.join(ch for ch in your_string if ch.isdigit())
Or use a regex on your string (if at some point you wanted to treat non-contiguous groups separately)...
import re
s = 'sd67637 8'
new_string = ''.join(re.findall(r'\d+', s))
# 676378
Then just print
them out:
print(old_string, '=', new_string)
I would not use RegEx for this. It is a lot slower!
Instead let's just use a simple for
loop.
TLDR;
This function will get the job done fast...
def filter_non_digits(string: str) -> str:
result = ''
for char in string:
if char in '1234567890':
result += char
return result
The Explanation
Let's create a very basic benchmark to test a few different methods that have been proposed. I will test three methods...
- For loop method (my idea).
- List Comprehension method from Jon Clements' answer.
- RegEx method from Moradnejad's answer.
# filters.py
import re
# For loop method
def filter_non_digits_for(string: str) -> str:
result = ''
for char in string:
if char in '1234567890':
result += char
return result
# Comprehension method
def filter_non_digits_comp(s: str) -> str:
return ''.join(ch for ch in s if ch.isdigit())
# RegEx method
def filter_non_digits_re(string: str) -> str:
return re.sub('[^\d]','', string)
Now that we have an implementation of each way of removing digits, let's benchmark each one.
Here is some very basic and rudimentary benchmark code. However, it will do the trick and give us a good comparison of how each method performs.
# tests.py
import time, platform
from filters import filter_non_digits_re,
filter_non_digits_comp,
filter_non_digits_for
def benchmark_func(func):
start = time.time()
# the "_" in the number just makes it more readable
for i in range(100_000):
func('afes098u98sfe')
end = time.time()
return (end-start)/100_000
def bench_all():
print(f'# System ({platform.system()} {platform.machine()})')
print(f'# Python {platform.python_version()}\n')
tests = [
filter_non_digits_re,
filter_non_digits_comp,
filter_non_digits_for,
]
for t in tests:
duration = benchmark_func(t)
ns = round(duration * 1_000_000_000)
print(f'{t.__name__.ljust(30)} {str(ns).rjust(6)} ns/op')
if __name__ == "__main__":
bench_all()
Here is the output from the benchmark code.
# System (Windows AMD64)
# Python 3.9.8
filter_non_digits_re 2920 ns/op
filter_non_digits_comp 1280 ns/op
filter_non_digits_for 660 ns/op
As you can see the filter_non_digits_for()
funciton is more than four times faster than using RegEx, and about twice as fast as the comprehension method. Sometimes simple is best.
The easiest way is with a regexp
import re
a = 'lkdfhisoe78347834 (())&/&745 '
result = re.sub('[^0-9]','', a)
print result
>>> '78347834745'
There is a builtin for this.
string.translate(s, table[, deletechars])
Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal. If table is None, then only the character deletion step is performed.
>>> import string
>>> non_numeric_chars = ''.join(set(string.printable) - set(string.digits))
>>> non_numeric_chars = string.printable[10:] # more effective method. (choose one)
'sd67637 8'.translate(None, non_numeric_chars)
'676378'
Or you could do it with no imports (but there is no reason for this):
>>> chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
>>> 'sd67637 8'.translate(None, chars)
'676378'