Julia's most efficient way to choose longest array in array of arrays?
@Colin has provided a compact, convenient answer. However, if speed matters (op asked for most efficient way) this should be close to optimum
function findlongest(A)
idx = 0
len = 0
@inbounds for i in 1:length(A)
l = length(A[i])
l > len && (idx = i; len=l)
end
return A[idx]
end
Note that this implementation would (presumably) be a really bad idea in Python :)
Quick benchmark:
julia> using BenchmarkTools
julia> A = [[1,2], [1,2,3,4,5,6], [1,2,3]]
3-element Array{Array{Int64,1},1}:
[1, 2]
[1, 2, 3, 4, 5, 6]
[1, 2, 3]
julia> @btime findlongest(A);
26.880 ns (0 allocations: 0 bytes)
julia> @btime A[indmax(length.(A))];
9.813 μs (25 allocations: 1.14 KiB)
That's a ~365 times speedup for this example.
EDIT: Better benchmark (suggested in the comments)
julia> @btime findlongest($A);
9.813 ns (0 allocations: 0 bytes)
julia> @btime $A[indmax(length.($A))];
41.813 ns (1 allocation: 112 bytes)
The $
signs avoid setup allocations and times. Speedup ~4.
Quick explanation
- for loops are fast in Julia, so why not use them
- avoid allocation (
length.(A)
allocates a new array of integers) a && b
is shortcut for "if a then b"@inbounds
avoids bound checks forA[i]
UPDATE: For v1+ you'll need to replace indmax
in this answer with argmax
.
EDIT: Note, it is also worth checking out the other answer by @crstnbr
Consider the following example code:
julia> A = [[1,2], [1,2,3,4,5,6], [1,2,3]]
3-element Array{Array{Int64,1},1}:
[1, 2]
[1, 2, 3, 4, 5, 6]
[1, 2, 3]
julia> length(A)
3
julia> length.(A)
3-element Array{Int64,1}:
2
6
3
julia> indmax(length.(A))
2
julia> A[indmax(length.(A))]
6-element Array{Int64,1}:
1
2
3
4
5
6
The first call to length
gets the length of the outer vector in A
, which is not what we want. In the second call, I use the broadcasting operator .
so that I instead get the length of each of the inner vectors. In the indmax
line, I'm finding the index of largest value in length.(A)
, ie the index of the longest inner vector. If you instead want to return the longest inner vector, you can just index into A
using the result of the indmax
line.