How to subtract strings in python

replace can do something that you do not want if the second string is present at several positions:

s1 = 'AJYFAJYF'
s2 = 'AJ'
if s1.startswith(s2):
    s3 = s1.replace(s2, '')
s3
# 'YFYF'

You can add an extra argument to replace to indicate that you want only one replacement to happen:

if s1.startswith(s2):
    s3 = s1.replace(s2, '', 1)
s3
# 'YFAJYF'

Or you could use the re module:

import re
if s1.startswith(s2):
    s3 = re.sub('^' + s2, '', s1)
s3
# 'YFAJYF'

The '^' is to ensure that s2 it is substituted only at the first position of s1.

Yet another approach, suggested in the comments, would be to take out the first len(s2) characters from s1:

if s1.startswith(s2):
    s3 = s1[len(s2):] 
s3
# 'YFAJYF'

Some tests using the %timeit magic in ipython (python 2.7.12, ipython 5.1.0) suggest that this last approach is faster:

In [1]: s1 = 'AJYFAJYF'

In [2]: s2 = 'AJ'

In [3]: %timeit s3 = s1[len(s2):]
The slowest run took 24.47 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.7 ns per loop

In [4]: %timeit s3 = s1[len(s2):]
The slowest run took 32.58 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.8 ns per loop

In [5]: %timeit s3 = s1[len(s2):]
The slowest run took 21.81 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.4 ns per loop

In [6]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 17.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 230 ns per loop

In [7]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 17.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 228 ns per loop

In [8]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 16.27 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 234 ns per loop

In [9]: import re

In [10]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 82.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop

In [11]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 12.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.86 µs per loop

In [12]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 13.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.84 µs per loop

if you insist on using the '-' operator, then use a class with the __ sub __ dunder method overitten, with a combination of one of the solutions provided above:

class String(object):
    def __init__(self, string):
        self.string = string

    def __sub__(self, other):
        if self.string.startswith(other.string):
            return self.string[len(other.string):]

    def __str__(self):
        return self.string


sub1 = String('AJYF') - String('AJ')
sub2 = String('GTYF') - String('GTY')
print(sub1)
print(sub2)

It prints:

YF
F

Easy Solution is:

>>> string1 = 'AJYF'
>>> string2 = 'AJ'
>>> if string2 in string1:
...     string1.replace(string2,'')
'YF'
>>>

I think what you want is this:

a = 'AJYF'
b = a.replace('AJ', '')
print b     # produces 'YF'
a = 'GTYF'
b = a.replace('GTY', '')
print b     # produces 'F'

How to subtract strings in python

Tags:

Python

String

Python 3.X

Bioinformatics

Related

Recent Posts