Find the number of remove-then-append operations needed to sort a given array

This can be done in O(n log n).

First find the minimum element in the array. Now, find the max element that occurs before this element. Call this max_left. You have to call swap()for all the elements before the min element of the array.

Now, find the longest increasing subsequence to the right of the min element, along with the constraint that you should skip elements whose values are greater than max_left. The required number of swaps is size(array) - size(LIS).

For example consider the array,

7 8 9 1 2 5 11 18

Minimum element in the array is 1. So we find the max before the minimum element.

7 8 9 | 1 2 5 11 18

max_left = 9

Now, find the LIS to the right of min with elements < 9 LIS = 1,2,5

No of swaps = 8 - 3 = 5

In cases where max element is null, ie., min is the first element, find the LIS of the array and required answer is size(array)-size(LIS)

For Example

2 5 4 3

max_left is null. LIS is 2 3

No of swaps = size(array) - size(LIS) = 4 - 2 = 2


This might work in O(nlogn) even if we don't assume array of consecutive values.
If we do - it can be done in O(n). One way of doing it is with O(n) space and O(nlogn) time.
Given array A sort it (O(nlogn)) into a second array B.
now... (arrays are indexed from 1)

swaps = 0
b = 1
for a = 1 to len(A)
  if A[a] == B[b]
    b = b + 1
  else
    swaps = swaps + 1

The problem boils down to finding the longest prefix of the sorted array that appears as a subsequence in the input array. This determines the elements that do not need to be sorted. The remaining elements will need to be deleted one by one, from the smallest to the largest, and appended at the back.

In your example, [3, 1, 2, 4], the already-sorted subsequence is [1, 2]. The optimal solution is to delete the remaning two elements, 3 and 4, and append them at the back. Thus the optimal solution is two "swaps".

Finding the subsequence can be done in O(n logn) time using O(n) extra memory. The following pseudo-code will do it (the code also happens to be valid Python):

l = [1, 2, 4, 3, 99, 98, 7]
s = sorted(l)
si = 0
for item in l:
  if item == s[si]:
    si += 1
print len(l) - si

If, as in your example, the array contains a permutation of integers from 1 to n, the problem can be solved in O(n) time using O(1) memory:

l = [1, 2, 3, 5, 4, 6]
s = 1
for item in l:
  if item == s:
    s += 1
print len(l) - s + 1

More generally, the second method can be used whenever we know the output array a priori and thus don't need to find it through sorting.


Observation: If an element is swapped to the back, its previous position does not matter. No element needs to be swapped more than once.

Observation: The last swap (if any) must move the largest element.

Observation: Before the swap, the array (excluding the last element) must be sorted (by former swaps, or initially)

Sorting algorithm, assuming the values are conecutive: find the longest sorted subsequence of consecutive (by value) elements starting at 1:

3 1 5 2 4

swap all higher elements in turn:

1 5 2 4 3

1 5 2 3 4

1 2 3 4 5

To find the number of swaps in O(n), find the length of the longest sorted subsequence of consecutive elements starting at 1:

  • expected = 1
  • for each element in sequence
    • if element == expected
      • expected += 1
  • return expected-1

then the number of swaps = the length of the input - its longest sorted subsequence.

An alternative solution ( O(n^2) ) if the input is not a permutation of 1..n:

  • swaps = 0
  • loop
    • find the first instance of the largest element and detect if the array is sorted
    • if the array is sorted, return swaps.
    • else remove the found element from the array and increment swaps.

Yet another solution ( O(n log n) ), assuming unique elements:

  • wrap each element in {oldPos, newPos, value}
  • make a shallow copy of the array
  • sort the array by value
  • store the new position of each element
  • run the algorithm for permutations on the newPos' in the (unsorted) copy

If you don't want to copy the input array, sort by oldPos before the last step instead.