Interleave different length lists, elimating duplicates, and preserve order
I suspect that you may be asking for a solution to the shortest common supersequence problem, which I believe is NP-hard in the general case of an arbitrary number of input sequences. I'm not aware of any libraries for solving this problem, so you might have to implement one by hand. Probably the quickest way to get to working code would be to take interjay's answer using difflib and then use reduce
to run it on an arbitrary number of lists (make sure to specify the empty list as the 3rd argument to reduce
).
What you need is basically what any merge utility does: It tries to merge two sequences, while keeping the relative order of each sequence. You can use Python's difflib
module to diff the two sequences, and merge them:
from difflib import SequenceMatcher
def merge_sequences(seq1,seq2):
sm=SequenceMatcher(a=seq1,b=seq2)
res = []
for (op, start1, end1, start2, end2) in sm.get_opcodes():
if op == 'equal' or op=='delete':
#This range appears in both sequences, or only in the first one.
res += seq1[start1:end1]
elif op == 'insert':
#This range appears in only the second sequence.
res += seq2[start2:end2]
elif op == 'replace':
#There are different ranges in each sequence - add both.
res += seq1[start1:end1]
res += seq2[start2:end2]
return res
Example:
>>> keys1 = ['A', 'B', 'C', 'D', 'E', 'H', 'I']
>>> keys2 = ['A', 'B', 'E', 'F', 'G', 'H', 'J', 'K']
>>> merge_sequences(keys1, keys2)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']
Note that the answer you expect is not necessarily the only possible one. For example, if we change the order of sequences here, we get another answer which is just as valid:
>>> merge_sequences(keys2, keys1)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'I']
By using only lists, you can achieve this with few simple for
loops and .copy()
:
def mergeLists(list1, list2):
# Exit if list2 is empty
if not len(list2):
return list1
# Copy the content of list2 into merged list
merged = list2.copy()
# Create a list for storing temporary elements
elements = []
# Create a variable for storing previous element found in both lists
previous = None
# Loop through the elements of list1
for e in list1:
# Append the element to "elements" list if it's not in list2
if e not in merged:
elements.append(e)
# If it is in list2 (is a common element)
else:
# Loop through the stored elements
for x in elements:
# Insert all the stored elements after the previous common element
merged.insert(previous and merged.index(previous) + 1 or 0, x)
# Save new common element to previous
previous = e
# Empty temporary elements
del elements[:]
# If no more common elements were found but there are elements still stored
if len(elements)
# Insert them after the previous match
for e in elements:
merged.insert(previous and merged.index(previous) + 1 or 0, e)
# Return the merged list
return merged
In [1]: keys1 = ["A", "B", "D", "F", "G", "H"]
In [2]: keys2 = ["A", "C", "D", "E", "F", "H"]
In [3]: mergeLists(keys1, keys2)
Out[3]: ["A", "B", "C", "D", "E", "F", "G", "H"]
English is not my first language, and this one is pretty hard to explain, but if you care about the explanation, here's what it does:
- There's a local list called
elements
which can store temporary elements. - There's a local variable called
previous
which stores the previous element that was in both lists. - When ever it finds an element that is NOT in
list2
but is inlist1
, it will append that element toelements
list and continue the loop. - Once it hits an element that is in both lists, it loops through the
elements
list, appending all elements afterprevious
element tolist2
. - The new match is then stored into
previous
andelements
is reset to[]
and the loop continues. - Beginning of the lists and end of the lists are counted as a common element, if first or last element is not a common element in both lists.
This way it will always follow this format:
- Previous common element
- Elements from list1, between two common elements
- Elements in list2, between two common elements
- New common element
So for example:
l1 = ["A", "B", "C", "E"]
l2 = ["A", "D", "E"]
- The revious common element
A
will be first in the merged list. - Elements from
l1
between the previous common elementA
and the new common elementE
will be inserted right afterA
. - Elements from
l2
between the previous common elmeentA
and the new common elmeentE
will be inserted right after the elements froml1
. - The new common element
E
will be last element. Back to step 1 if more common elements found.
["A", "B", "C", "D", "E"]
I would use a Set (cf. python doc), that I'd fill with the elements of the two lists, one aafter the other.
And make a list from the Set when it's done.
Note that there is a contradiction/paradox in your question: you want to preserve order for elements that cannot be compared (only equality because "they are complex strings" as you said).
EDIT: the OP is right noticing that sets don't preserve order of insertion.