what are the fast algorithms to find duplicate elements in a collection and group them?

Yes, you can do much better.

Sort them (O(n) for simple integers, O(n*log n) in general), then duplicates are guaranteed to be adjacent, making finding them quick O(n)
Use a hash table, also O(n). For each item, (a) check if it's already in the hash table; if so, its a duplicate; if not, put it in the hash table.

edit

The method you're using seems to do O(N^2) comparisons:

for i = 0; i < length; ++i           // will do length times
    for j = i+1; j < length; ++j     // will do length-i times
        compare

So for length 5 you do 4+3+2+1=10 compares; for 6 you do 15, etc. (N^2)/2 - N/2 to be exact. N*log(N) is smaller, for any reasonably high value of N.

How big is N in your case?

As far as reducing hash collisions, the best way is to get a better hash function :-D. Assuming that is not possible, if you can make a variant (e.g., different modulous), you may be able to do a nested hash.

1. Sort the array O(n log n) at worst - mergesort/heapsort/binary tree sort etc

2. Compare neighbors and pull the matches out O(n)

Keep a hash-table based structure from value to count; if your C++ implementation doesn't offer std::hash_map (not really part of the C++ standard so far!-) use Boost or grab a version off the web. One pass over the collection (i.e., O(N)) lets you do a value->count mapping; one more pass over the hash table (<= O(N), clearly) to identify values with a count > 1 and emit them appropriately. Overall O(N), which is not the case for your suggestion.

what are the fast algorithms to find duplicate elements in a collection and group them?

edit

Tags:

Algorithm

C++

Duplicate Data

Elements

Related

Recent Posts