How to subtract strings in python
replace
can do something that you do not want if the second string is present at several positions:
s1 = 'AJYFAJYF'
s2 = 'AJ'
if s1.startswith(s2):
s3 = s1.replace(s2, '')
s3
# 'YFYF'
You can add an extra argument to replace
to indicate that you want only one replacement to happen:
if s1.startswith(s2):
s3 = s1.replace(s2, '', 1)
s3
# 'YFAJYF'
Or you could use the re
module:
import re
if s1.startswith(s2):
s3 = re.sub('^' + s2, '', s1)
s3
# 'YFAJYF'
The '^'
is to ensure that s2
it is substituted only at the first position of s1
.
Yet another approach, suggested in the comments, would be to take out the first len(s2)
characters from s1
:
if s1.startswith(s2):
s3 = s1[len(s2):]
s3
# 'YFAJYF'
Some tests using the %timeit magic in ipython (python 2.7.12, ipython 5.1.0) suggest that this last approach is faster:
In [1]: s1 = 'AJYFAJYF'
In [2]: s2 = 'AJ'
In [3]: %timeit s3 = s1[len(s2):]
The slowest run took 24.47 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.7 ns per loop
In [4]: %timeit s3 = s1[len(s2):]
The slowest run took 32.58 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.8 ns per loop
In [5]: %timeit s3 = s1[len(s2):]
The slowest run took 21.81 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.4 ns per loop
In [6]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 17.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 230 ns per loop
In [7]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 17.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 228 ns per loop
In [8]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 16.27 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 234 ns per loop
In [9]: import re
In [10]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 82.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop
In [11]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 12.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.86 µs per loop
In [12]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 13.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.84 µs per loop
if you insist on using the '-' operator, then use a class with the __ sub __ dunder method overitten, with a combination of one of the solutions provided above:
class String(object):
def __init__(self, string):
self.string = string
def __sub__(self, other):
if self.string.startswith(other.string):
return self.string[len(other.string):]
def __str__(self):
return self.string
sub1 = String('AJYF') - String('AJ')
sub2 = String('GTYF') - String('GTY')
print(sub1)
print(sub2)
It prints:
YF
F
Easy Solution is:
>>> string1 = 'AJYF'
>>> string2 = 'AJ'
>>> if string2 in string1:
... string1.replace(string2,'')
'YF'
>>>
I think what you want is this:
a = 'AJYF'
b = a.replace('AJ', '')
print b # produces 'YF'
a = 'GTYF'
b = a.replace('GTY', '')
print b # produces 'F'