How to split a byte array around a byte sequence in Java?
I modified 'L. Blanc' answer to handle delimiters at the very beginning and at the very end. Plus I renamed it to 'split'.
private List<byte[]> split(byte[] array, byte[] delimiter)
{
List<byte[]> byteArrays = new LinkedList<byte[]>();
if (delimiter.length == 0)
{
return byteArrays;
}
int begin = 0;
outer: for (int i = 0; i < array.length - delimiter.length + 1; i++)
{
for (int j = 0; j < delimiter.length; j++)
{
if (array[i + j] != delimiter[j])
{
continue outer;
}
}
// If delimiter is at the beginning then there will not be any data.
if (begin != i)
byteArrays.add(Arrays.copyOfRange(array, begin, i));
begin = i + delimiter.length;
}
// delimiter at the very end with no data following?
if (begin != array.length)
byteArrays.add(Arrays.copyOfRange(array, begin, array.length));
return byteArrays;
}
Here is a straightforward solution.
Unlike avgvstvs approach it handles arbitrary length delimiters. The top answer is also good, but the author hasn't fixed the issue pointed out by Eitan Perkal. That issue is avoided here using the approach Perkal suggests.
public static List<byte[]> tokens(byte[] array, byte[] delimiter) {
List<byte[]> byteArrays = new LinkedList<>();
if (delimiter.length == 0) {
return byteArrays;
}
int begin = 0;
outer:
for (int i = 0; i < array.length - delimiter.length + 1; i++) {
for (int j = 0; j < delimiter.length; j++) {
if (array[i + j] != delimiter[j]) {
continue outer;
}
}
byteArrays.add(Arrays.copyOfRange(array, begin, i));
begin = i + delimiter.length;
}
byteArrays.add(Arrays.copyOfRange(array, begin, array.length));
return byteArrays;
}
Note that you can reliably convert from byte[] to String and back, with a one-to-one mapping of chars to bytes, if you use the encoding "iso8859-1".
However, it's still an ugly solution.
I think you'll need to roll your own.
I suggest solving it in two stages:
- Work out how to find the of indexes of each occurrence of the separator. Google for "Knuth-Morris-Pratt" for an efficient algorithm - although a more naive algorithm will be fine for short delimiters.
- Each time you find an index, use Arrays.copyOfRange() to get the piece you need and add it to your output list.
Here it is using a naive pattern finding algorithm. KMP would become worth it if the delimiters are long (because it saves backtracking, but doesn't miss delimiters if they're embedded in sequence that mismatches at the end).
public static boolean isMatch(byte[] pattern, byte[] input, int pos) {
for(int i=0; i< pattern.length; i++) {
if(pattern[i] != input[pos+i]) {
return false;
}
}
return true;
}
public static List<byte[]> split(byte[] pattern, byte[] input) {
List<byte[]> l = new LinkedList<byte[]>();
int blockStart = 0;
for(int i=0; i<input.length; i++) {
if(isMatch(pattern,input,i)) {
l.add(Arrays.copyOfRange(input, blockStart, i));
blockStart = i+pattern.length;
i = blockStart;
}
}
l.add(Arrays.copyOfRange(input, blockStart, input.length ));
return l;
}