Checking if two strings are permutations of each other in Python

Here is a way which is O(n), asymptotically better than the two ways you suggest.

import collections

def same_permutation(a, b):
    d = collections.defaultdict(int)
    for x in a:
        d[x] += 1
    for x in b:
        d[x] -= 1
    return not any(d.itervalues())

## same_permutation([1,2,3],[2,3,1])
#. True

## same_permutation([1,2,3],[2,3,1,1])
#. False

"but the first one is slower when (for example) the first char of a is nowhere in b".

This kind of degenerate-case performance analysis is not a good idea. It's a rat-hole of lost time thinking up all kinds of obscure special cases.

Only do the O-style "overall" analysis.

Overall, the sorts are O( n log( n ) ).

The a.count(char) for char in a solution is O( n 2 ). Each count pass is a full examination of the string.

If some obscure special case happens to be faster -- or slower, that's possibly interesting. But it only matters when you know the frequency of your obscure special cases. When analyzing sort algorithms, it's important to note that a fair number of sorts involve data that's already in the proper order (either by luck or by a clever design), so sort performance on pre-sorted data matters.

In your obscure special case ("the first char of a is nowhere in b") is this frequent enough to matter? If it's just a special case you thought of, set it aside. If it's a fact about your data, then consider it.


heuristically you're probably better to split them off based on string size.

Pseudocode:

returnvalue = false
if len(a) == len(b)
   if len(a) < threshold
      returnvalue = (sorted(a) == sorted(b))
   else
       returnvalue = naminsmethod(a, b)
return returnvalue

If performance is critical, and string size can be large or small then this is what I'd do.

It's pretty common to split things like this based on input size or type. Algorithms have different strengths or weaknesses and it would be foolish to use one where another would be better... In this case Namin's method is O(n), but has a larger constant factor than the O(n log n) sorted method.