Count letter differences of two strings
Python has the excellent difflib
, which should provide the needed functionnality.
Here's sample usage from the documentation:
import difflib # Works for python >= 2.1
>>> s = difflib.SequenceMatcher(lambda x: x == " ",
... "private Thread currentThread;",
... "private volatile Thread currentThread;")
>>> for block in s.get_matching_blocks():
... print "a[%d] and b[%d] match for %d elements" % block
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 21 elements
a[29] and b[38] match for 0 elements
I think this example will work for your specific case without too much hassle and without hitting interoperability issues with your python software version (upgrade to 2.7 please):
a='IGADKYFHARGNYDAA'
b='KGADKYFHARGNYEAA'
u=zip(a,b)
d=dict(u)
x=[]
for i,j in d.items():
if i==j:
x.append('*')
else:
x.append(j)
print x
Outputs: ['*', 'E', '*', '*', 'K', '*', '*', '*', '*', '*']
With a few tweaks, you can get what you want....Tell me if it helps :-)
Update
You can also use this:
a='IGADKYFHARGNYDAA'
b='KGADKYFHARGNYEAA'
u=zip(a,b)
for i,j in u:
if i==j:
print i,'--',j
else:
print i,' ',j
Outputs:
I K
G -- G
A -- A
D -- D
K -- K
Y -- Y
F -- F
H -- H
A -- A
R -- R
G -- G
N -- N
Y -- Y
D E
A -- A
A -- A
Update 2
You may modify the code like this:
y=[]
counter=0
for i,j in u:
if i==j:
print i,'--',j
else:
y.append(j)
print i,' ',j
print '\n', y
print '\n Length = ',len(y)
Outputs:
I K
G -- G
A -- A
D -- D
K -- K
Y -- Y
F -- F
H -- H
A -- A
R -- R
G -- G
N -- N
Y -- Y
D E
A -- A
A X
['K', 'E', 'X']
Length = 3
The Theory
- Iterate over both strings simultaneously and compare the characters.
- Store the result with a new string by adding either a spacebar or a
|
character to it, respectively. Also, increase a integer-value starting from zero for each different character. - Output the result.
Implementation
You can use the built-in zip
function or itertools.izip
to simultaneously iterate over both strings, while the latter is a little more performant in case of huge input. If the strings are not of the same size, iteration will only happen for the shorter-part. If this is the case, you can fill up the rest with the no-match indicating character.
import itertools
def compare(string1, string2, no_match_c=' ', match_c='|'):
if len(string2) < len(string1):
string1, string2 = string2, string1
result = ''
n_diff = 0
for c1, c2 in itertools.izip(string1, string2):
if c1 == c2:
result += match_c
else:
result += no_match_c
n_diff += 1
delta = len(string2) - len(string1)
result += delta * no_match_c
n_diff += delta
return (result, n_diff)
Example
Here's a simple test, with slightly different options than from your example above. Note that I have used an underscore for indicating non-matching characters to better demonstrate how the resulting string is expanded to the size of the longer string.
def main():
string1 = 'IGADKYFHARGNYDAA AWOOH'
string2 = 'KGADKYFHARGNYEAA W'
result, n_diff = compare(string1, string2, no_match_c='_')
print "%d difference(s)." % n_diff
print string1
print result
print string2
main()
Output:
niklas@saphire:~/Desktop$ python foo.py
6 difference(s).
IGADKYFHARGNYDAA AWOOH
_||||||||||||_|||_|___
KGADKYFHARGNYEAA W
def diff_letters(a,b):
return sum ( a[i] != b[i] for i in range(len(a)) )