Splitting List into sublists along elements
The solution is to use Stream.collect
. To create a Collector using its builder pattern is already given as solution. The alternative is the other overloaded collect
being a tiny bit more primitive.
List<String> strings = Arrays.asList("a", "b", null, "c", null, "d", "e");
List<List<String>> groups = strings.stream()
.collect(() -> {
List<List<String>> list = new ArrayList<>();
list.add(new ArrayList<>());
return list;
},
(list, s) -> {
if (s == null) {
list.add(new ArrayList<>());
} else {
list.get(list.size() - 1).add(s);
}
},
(list1, list2) -> {
// Simple merging of partial sublists would
// introduce a false level-break at the beginning.
list1.get(list1.size() - 1).addAll(list2.remove(0));
list1.addAll(list2);
});
As one sees, I make a list of string lists, where there always is at least one last (empty) string list.
- The first function creates a starting list of string lists. It specifies the result (typed) object.
- The second function is called to process each element. It is an action on the partial result and an element.
- The third is not really used, it comes into play on parallelising the processing, when partial results must be combined.
A solution with an accumulator:
As @StuartMarks points out, the combiner does not fullfill the contract for parallelism.
Due to the comment of @ArnaudDenoyelle a version using reduce
.
List<List<String>> groups = strings.stream()
.reduce(new ArrayList<List<String>>(),
(list, s) -> {
if (list.isEmpty()) {
list.add(new ArrayList<>());
}
if (s == null) {
list.add(new ArrayList<>());
} else {
list.get(list.size() - 1).add(s);
}
return list;
},
(list1, list2) -> {
list1.addAll(list2);
return list1;
});
- The first parameter is the accumulated object.
- The second function accumulates.
- The third is the aforementioned combiner.
Although there are several answers already, and an accepted answer, there are still a couple points missing from this topic. First, the consensus seems to be that solving this problem using streams is merely an exercise, and that the conventional for-loop approach is preferable. Second, the answers given thus far have overlooked an approach using array or vector-style techniques that I think improves the streams solution considerably.
First, here's a conventional solution, for purposes of discussion and analysis:
static List<List<String>> splitConventional(List<String> input) {
List<List<String>> result = new ArrayList<>();
int prev = 0;
for (int cur = 0; cur < input.size(); cur++) {
if (input.get(cur) == null) {
result.add(input.subList(prev, cur));
prev = cur + 1;
}
}
result.add(input.subList(prev, input.size()));
return result;
}
This is mostly straightforward but there's a bit of subtlety. One point is that a pending sublist from prev
to cur
is always open. When we encounter null
we close it, add it to the result list, and advance prev
. After the loop we close the sublist unconditionally.
Another observation is that this is a loop over indexes, not over the values themselves, thus we use an arithmetic for-loop instead of the enhanced "for-each" loop. But it suggests that we can stream using the indexes to generate subranges instead of streaming over values and putting the logic into the collector (as was done by Joop Eggen's proposed solution).
Once we've realized that, we can see that each position of null
in the input is the delimiter for a sublist: it's the right end of the sublist to the left, and it (plus one) is the left end of the sublist to the right. If we can handle the edge cases, it leads to an approach where we find the indexes at which null
elements occur, map them to sublists, and collect the sublists.
The resulting code is as follows:
static List<List<String>> splitStream(List<String> input) {
int[] indexes = Stream.of(IntStream.of(-1),
IntStream.range(0, input.size())
.filter(i -> input.get(i) == null),
IntStream.of(input.size()))
.flatMapToInt(s -> s)
.toArray();
return IntStream.range(0, indexes.length-1)
.mapToObj(i -> input.subList(indexes[i]+1, indexes[i+1]))
.collect(toList());
}
Getting the indexes at which null
occurs is pretty easy. The stumbling block is adding -1
at the left and size
at the right end. I've opted to use Stream.of
to do the appending and then flatMapToInt
to flatten them out. (I tried several other approaches but this one seemed like the cleanest.)
It's a bit more convenient to use arrays for the indexes here. First, the notation for accessing an array is nicer than for a List: indexes[i]
vs. indexes.get(i)
. Second, using an array avoids boxing.
At this point, each index value in the array (except for the last) is one less than the beginning position of a sublist. The index to its immediate right is the end of the sublist. We simply stream over the array and map each pair of indexes into a sublist and collect the output.
Discussion
The streams approach is slightly shorter than the for-loop version, but it's denser. The for-loop version is familiar, because we do this stuff in Java all the time, but if you're not already aware of what this loop is supposed to be doing, it's not obvious. You might have to simulate a few loop executions before you figure out what prev
is doing and why the open sublist has to be closed after the end of the loop. (I initially forgot to have it, but I caught this in testing.)
The streams approach is, I think, easier to conceptualize what's going on: get a list (or an array) that indicates the boundaries between sublists. That's an easy streams two-liner. The difficulty, as I mentioned above, is finding a way to tack the edge values onto the ends. If there were a better syntax for doing this, e.g.,
// Java plus pidgin Scala
int[] indexes =
[-1] ++ IntStream.range(0, input.size())
.filter(i -> input.get(i) == null) ++ [input.size()];
it would make things a lot less cluttered. (What we really need is array or list comprehension.) Once you have the indexes, it's a simple matter to map them into actual sublists and collect them into the result list.
And of course this is safe when run in parallel.
UPDATE 2016-02-06
Here's a nicer way to create the array of sublist indexes. It's based on the same principles, but it adjusts the index range and adds some conditions to the filter to avoid having to concatenate and flatmap the indexes.
static List<List<String>> splitStream(List<String> input) {
int sz = input.size();
int[] indexes =
IntStream.rangeClosed(-1, sz)
.filter(i -> i == -1 || i == sz || input.get(i) == null)
.toArray();
return IntStream.range(0, indexes.length-1)
.mapToObj(i -> input.subList(indexes[i]+1, indexes[i+1]))
.collect(toList());
}
UPDATE 2016-11-23
I co-presented a talk with Brian Goetz at Devoxx Antwerp 2016, "Thinking In Parallel" (video) that featured this problem and my solutions. The problem presented there is a slight variation that splits on "#" instead of null, but it's otherwise the same. In the talk, I mentioned that I had a bunch of unit tests for this problem. I've appended them below, as a standalone program, along with my loop and streams implementations. An interesting exercise for readers is to run solutions proposed in other answers against the test cases I've provided here, and to see which ones fail and why. (The other solutions will have to be adapted to split based on a predicate instead of splitting on null.)
import java.util.*;
import java.util.function.*;
import java.util.stream.*;
import static java.util.Arrays.asList;
public class ListSplitting {
static final Map<List<String>, List<List<String>>> TESTCASES = new LinkedHashMap<>();
static {
TESTCASES.put(asList(),
asList(asList()));
TESTCASES.put(asList("a", "b", "c"),
asList(asList("a", "b", "c")));
TESTCASES.put(asList("a", "b", "#", "c", "#", "d", "e"),
asList(asList("a", "b"), asList("c"), asList("d", "e")));
TESTCASES.put(asList("#"),
asList(asList(), asList()));
TESTCASES.put(asList("#", "a", "b"),
asList(asList(), asList("a", "b")));
TESTCASES.put(asList("a", "b", "#"),
asList(asList("a", "b"), asList()));
TESTCASES.put(asList("#"),
asList(asList(), asList()));
TESTCASES.put(asList("a", "#", "b"),
asList(asList("a"), asList("b")));
TESTCASES.put(asList("a", "#", "#", "b"),
asList(asList("a"), asList(), asList("b")));
TESTCASES.put(asList("a", "#", "#", "#", "b"),
asList(asList("a"), asList(), asList(), asList("b")));
}
static final Predicate<String> TESTPRED = "#"::equals;
static void testAll(BiFunction<List<String>, Predicate<String>, List<List<String>>> f) {
TESTCASES.forEach((input, expected) -> {
List<List<String>> actual = f.apply(input, TESTPRED);
System.out.println(input + " => " + expected);
if (!expected.equals(actual)) {
System.out.println(" ERROR: actual was " + actual);
}
});
}
static <T> List<List<T>> splitStream(List<T> input, Predicate<? super T> pred) {
int[] edges = IntStream.range(-1, input.size()+1)
.filter(i -> i == -1 || i == input.size() ||
pred.test(input.get(i)))
.toArray();
return IntStream.range(0, edges.length-1)
.mapToObj(k -> input.subList(edges[k]+1, edges[k+1]))
.collect(Collectors.toList());
}
static <T> List<List<T>> splitLoop(List<T> input, Predicate<? super T> pred) {
List<List<T>> result = new ArrayList<>();
int start = 0;
for (int cur = 0; cur < input.size(); cur++) {
if (pred.test(input.get(cur))) {
result.add(input.subList(start, cur));
start = cur + 1;
}
}
result.add(input.subList(start, input.size()));
return result;
}
public static void main(String[] args) {
System.out.println("===== Loop =====");
testAll(ListSplitting::splitLoop);
System.out.println("===== Stream =====");
testAll(ListSplitting::splitStream);
}
}
The only solution I come up with for the moment is by implementing your own custom collector.
Before reading the solution, I want to add a few notes about this. I took this question more as a programming exercise, I'm not sure if it can be done with a parallel stream.
So you have to be aware that it'll silently break if the pipeline is run in parallel.
This is not a desirable behavior and should be avoided. This is why I throw an exception in the combiner part (instead of (l1, l2) -> {l1.addAll(l2); return l1;}
), as it's used in parallel when combining the two lists, so that you have an exception instead of a wrong result.
Also this is not very efficient due to list copying (although it uses a native method to copy the underlying array).
So here's the collector implementation:
private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
final List<String> current = new ArrayList<>();
return Collector.of(() -> new ArrayList<List<String>>(),
(l, elem) -> {
if (sep.test(elem)) {
l.add(new ArrayList<>(current));
current.clear();
}
else {
current.add(elem);
}
},
(l1, l2) -> {
throw new RuntimeException("Should not run this in parallel");
},
l -> {
if (current.size() != 0) {
l.add(current);
return l;
}
);
}
and how to use it:
List<List<String>> ll = list.stream().collect(splitBySeparator(Objects::isNull));
Output:
[[a, b], [c], [d, e]]
As the answer of Joop Eggen is out, it appears that it can be done in parallel (give him credit for that!). With that it reduces the custom collector implementation to:
private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
return Collector.of(() -> new ArrayList<List<String>>(Arrays.asList(new ArrayList<>())),
(l, elem) -> {if(sep.test(elem)){l.add(new ArrayList<>());} else l.get(l.size()-1).add(elem);},
(l1, l2) -> {l1.get(l1.size() - 1).addAll(l2.remove(0)); l1.addAll(l2); return l1;});
}
which let the paragraph about parallelism a bit obsolete, however I let it as it can be a good reminder.
Note that the Stream API is not always a substitute. There are tasks that are easier and more suitable using the streams and there are tasks that are not. In your case, you could also create a utility method for that:
private static <T> List<List<T>> splitBySeparator(List<T> list, Predicate<? super T> predicate) {
final List<List<T>> finalList = new ArrayList<>();
int fromIndex = 0;
int toIndex = 0;
for(T elem : list) {
if(predicate.test(elem)) {
finalList.add(list.subList(fromIndex, toIndex));
fromIndex = toIndex + 1;
}
toIndex++;
}
if(fromIndex != toIndex) {
finalList.add(list.subList(fromIndex, toIndex));
}
return finalList;
}
and call it like List<List<String>> list = splitBySeparator(originalList, Objects::isNull);
.
It can be improved for checking edge-cases.