Count letter differences of two strings

Python has the excellent difflib, which should provide the needed functionnality.

Here's sample usage from the documentation:

import difflib  # Works for python >= 2.1

>>> s = difflib.SequenceMatcher(lambda x: x == " ",
...                     "private Thread currentThread;",
...                     "private volatile Thread currentThread;")
>>> for block in s.get_matching_blocks():
...     print "a[%d] and b[%d] match for %d elements" % block
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 21 elements
a[29] and b[38] match for 0 elements

I think this example will work for your specific case without too much hassle and without hitting interoperability issues with your python software version (upgrade to 2.7 please):

a='IGADKYFHARGNYDAA'
b='KGADKYFHARGNYEAA'

u=zip(a,b)
d=dict(u)

x=[]
for i,j in d.items(): 
    if i==j:
        x.append('*') 
    else: 
        x.append(j)
        
print x

Outputs: ['*', 'E', '*', '*', 'K', '*', '*', '*', '*', '*']

With a few tweaks, you can get what you want....Tell me if it helps :-)

Update

You can also use this:

a='IGADKYFHARGNYDAA'
b='KGADKYFHARGNYEAA'

u=zip(a,b)
for i,j in u:
    if i==j:
        print i,'--',j
    else: 
        print i,'  ',j

Outputs:

I    K
G -- G
A -- A
D -- D
K -- K
Y -- Y
F -- F
H -- H
A -- A
R -- R
G -- G
N -- N
Y -- Y
D    E
A -- A
A -- A

Update 2

You may modify the code like this:

y=[]
counter=0
for i,j in u:
    if i==j:
        print i,'--',j
    else: 
        y.append(j)
        print i,'  ',j
        
print '\n', y

print '\n Length = ',len(y)

Outputs:

I    K
G -- G
A -- A
D -- D
K -- K
Y -- Y
F -- F
H -- H
A -- A
R -- R
G -- G
N -- N
Y -- Y
D    E
A -- A
A    X

['K', 'E', 'X']

 Length =  3

The Theory

Iterate over both strings simultaneously and compare the characters.
Store the result with a new string by adding either a spacebar or a | character to it, respectively. Also, increase a integer-value starting from zero for each different character.
Output the result.

Implementation

You can use the built-in zip function or itertools.izip to simultaneously iterate over both strings, while the latter is a little more performant in case of huge input. If the strings are not of the same size, iteration will only happen for the shorter-part. If this is the case, you can fill up the rest with the no-match indicating character.

import itertools

def compare(string1, string2, no_match_c=' ', match_c='|'):
    if len(string2) < len(string1):
        string1, string2 = string2, string1
    result = ''
    n_diff = 0
    for c1, c2 in itertools.izip(string1, string2):
        if c1 == c2:
            result += match_c
        else:
            result += no_match_c
            n_diff += 1
    delta = len(string2) - len(string1)
    result += delta * no_match_c
    n_diff += delta
    return (result, n_diff)

Example

Here's a simple test, with slightly different options than from your example above. Note that I have used an underscore for indicating non-matching characters to better demonstrate how the resulting string is expanded to the size of the longer string.

def main():
    string1 = 'IGADKYFHARGNYDAA AWOOH'
    string2 = 'KGADKYFHARGNYEAA  W'
    result, n_diff = compare(string1, string2, no_match_c='_')

    print "%d difference(s)." % n_diff  
    print string1
    print result
    print string2

main()

Output:

niklas@saphire:~/Desktop$ python foo.py 
6 difference(s).
IGADKYFHARGNYDAA AWOOH
_||||||||||||_|||_|___
KGADKYFHARGNYEAA  W

def diff_letters(a,b):
    return sum ( a[i] != b[i] for i in range(len(a)) )

Count letter differences of two strings

The Theory

Implementation

Example

Tags:

Python

Related

Recent Posts