How to find the longest list in a list?
This will return the longest list in the list values
:
max(values, key=len)
- The answer from blhsing is great for finding the first, longest sub-list, and it's fast.
- For a list of 1M lists, varying in length from 1-15, it takes 29.6 ms to return the first list with the maximum length.
values = [['a','a'], ['a','b','b'], ['a','b','b','a'], ['a','b','c','a']]
max(values, key=len)
[out]:
['a', 'b', 'b', 'a']
- This
pandas
solution isn't a competitor with the accepted answer for speed in returning the first, longest list. - There are a lot of people using
pandas
for analysis, so this is a valid question, from that perspecive. - This solution is for returning all sub-lists for the max list length, or a specified length.
df.len.max()
can be substituted with anint
, to return lists of a specified length.
- This solution takes advantage of pandas: Boolean Indexing.
- This solution is slower, but it's returning a different result
- The lists have to be loaded in pandas
- The
'len'
column is created - The Boolean mask is used to return all the matching lists
- For a list of 1M lists, varying in length from 1-15, it takes 682 ms to return all the lists with the maximum (or specified) length.
- It should be noted,
max(df.lists, key=len)
can be used on apandas.Series
to find the first, longest list.
import pandas as pd
# convert the list of lists to a dataframe
df = pd.DataFrame({'lists': values})
# display(df)
lists
0 [a, a]
1 [a, b, b]
2 [a, b, b, a]
3 [a, b, c, a]
# create column for the length of each list
df['len'] = df.lists.map(len)
lists len
0 [a, a] 2
1 [a, b, b] 3
2 [a, b, b, a] 4
3 [a, b, c, a] 4
# select lists with max(len)
max_len = df[df.len == df.len.max()] # or [df.len == some int] for a specific length
# display(max_len)
lists len
2 [a, b, b, a] 4
3 [a, b, c, a] 4
%timeit
import pandas as pd
import random
import string
# 1M sub-list of 1-15 characters
l = [random.sample(string.ascii_letters, random.randint(1, 15)) for _ in range(10**6)]
%timeit max(l, key=len)
29.6 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# function to do all the pandas stuff for testing
def get_max_len(l):
df = pd.DataFrame({'lists': l})
df['len'] = df.lists.map(len)
return df[df.len == df.len.max()]
%timeit get_max_len(l)
682 ms ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This will return the length of the longest list:
max(map(len, values))