Getting first n unique elements from Python list

If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like

from collections import OrderedDict


def nub(iterable):
    """Returns unique elements preserving order."""
    return OrderedDict.fromkeys(iterable).keys()

and then implementation of iterate can be simplified to

from itertools import islice


def iterate(itr, upper=5):
    return islice(nub(itr), upper)

or if you want always a list as an output

def iterate(itr, upper=5):
    return list(nub(itr))[:upper]

Improvements

As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:

def nub(iterable):
    seen = set()
    add_seen = seen.add
    for element in iterable:
        if element in seen:
            continue
        yield element
        add_seen(element)

Here is a Pythonic approach using itertools.takewhile():

In [95]: from itertools import takewhile

In [96]: seen = set()

In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}

You can adapt the popular itertools unique_everseen recipe:

def unique_everseen_limit(iterable, limit=5):
    seen = set()
    seen_add = seen.add
    for element in iterable:
        if element not in seen:
            seen_add(element)
            yield element
        if len(seen) == limit:
            break

a = [1,2,2,3,3,4,5,6]

res = list(unique_everseen_limit(a))  # [1, 2, 3, 4, 5]

Alternatively, as suggested by @Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:

from itertools import islice

def unique_everseen(iterable):
    seen = set()
    seen_add = seen.add
    for element in iterable:
        if element not in seen:
            seen_add(element)
            yield element

res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]

Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:

from itertools import islice
from more_itertools import unique_everseen
from toolz import unique

res = list(islice(unique_everseen(a), 5))  # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5))           # [1, 2, 3, 4, 5]

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1, 2, 2, 3, 3, 4, 5, 6]
    
def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable. 
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
        if e in seen:
            continue
        seen.add(e)
        yield e
        if len(seen) == N:
            return
            
k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
print(list(k))

Output:

[1, 2, 3, 4]

According to PEP-479 you should return from generators, not raise StopIteration - thanks to @khelwood & @iBug for that piece of comment - one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration

Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.

Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:

For 6 uniques (in longer list):

you would have lookups of O(1)+O(2)+...+O(5001)
mine would have 5001*O(1) lookup + memory for set( {1, 2, 3, 4, 5, 6})

Getting first n unique elements from Python list

Improvements

Tags:

Python

Unique

Python 3.X

Generator

Set

Related

Recent Posts