Extract duplicate objects from a List in Java 8
If you could implement equals
and hashCode
on Person
you could then use a counting down-stream collector of the groupingBy
to get distinct elements that have been duplicated.
List<Person> duplicates = personList.stream()
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.filter(n -> n.getValue() > 1)
.map(n -> n.getKey())
.collect(toList());
If you would like to keep a list of sequential repeated elements you can then expand this out using Collections.nCopies to expand it back out. This method will ensure repeated elements are ordered together.
List<Person> duplicates = personList.stream()
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.filter(n -> n.getValue() > 1)
.flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream())
.collect(toList());
List<Person> duplicates = personList.stream()
.collect(Collectors.groupingBy(Person::getId))
.entrySet().stream()
.filter(e->e.getValue().size() > 1)
.flatMap(e->e.getValue().stream())
.collect(Collectors.toList());
That should give you a List of Person
where the id
has been duplicated.
To indentify duplicates, no method I know of is better suited than Collectors.groupingBy()
. This allows you to group the list into a map based on a condition of your choice.
Your condition is a combination of id
and firstName
. Let's extract this part into an own method in Person
:
String uniqueAttributes() {
return id + firstName;
}
The getDuplicates()
method is now quite straightforward:
public static List<Person> getDuplicates(final List<Person> personList) {
return getDuplicatesMap(personList).values().stream()
.filter(duplicates -> duplicates.size() > 1)
.flatMap(Collection::stream)
.collect(Collectors.toList());
}
private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
return personList.stream().collect(groupingBy(Person::uniqueAttributes));
}
- The first line calls another method
getDuplicatesMap()
to create the map as explained above. - It then streams over the values of the map, which are lists of persons.
- It filters out everything except lists with a size greater than 1, i.e. it finds the duplicates.
- Finally,
flatMap()
is used to flatten the stream of lists into one single stream of persons, and collects the stream to a list.
An alternative, if you truly identify persons as equal if the have the same id
and firstName
is to go with the solution by Jonathan Johx and implement an equals()
method.