Java - Best way to grab ALL Strings between two Strings? (regex?)
You can construct the regex to do this for you:
// pattern1 and pattern2 are String objects
String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);
This will treat the pattern1
and pattern2
as literal text, and the text in between the patterns is captured in the first capturing group. You can remove Pattern.quote()
if you want to use regex, but I don't guarantee anything if you do that.
You can add some customization of how the match should occurs by adding flags to the regexString
.
- If you want Unicode-aware case-insensitive matching, then add
(?iu)
at the beginning ofregexString
, or supplyPattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE
flag toPattern.compile
method. - If you want to capture the content even if the two delimiting strings appear across lines, then add
(?s)
before(.*?)
, i.e."(?s)(.*?)"
, or supplyPattern.DOTALL
flag toPattern.compile
method.
Then compile the regex, obtain a Matcher
object, iterate through the matches and save them into a List
(or any Collection
, it's up to you).
Pattern pattern = Pattern.compile(regexString);
// text contains the full text that you want to extract data
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
String textInBetween = matcher.group(1); // Since (.*?) is capturing group 1
// You can insert match into a List/Collection here
}
Testing code:
String pattern1 = "hgb";
String pattern2 = "|";
String text = "sdfjsdkhfkjsdf hgb sdjfkhsdkfsdf |sdfjksdhfjksd sdf sdkjfhsdkf | sdkjfh hgb sdkjfdshfks|";
Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2));
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group(1));
}
Do note that if you search for the text between foo
and bar
in this input foo text foo text bar text bar
with the method above, you will get one match, which is text foo text
.
Here's a one-liner that does it all:
List<String> strings = Arrays.asList( input.replaceAll("^.*?pattern1", "")
.split("pattern2.*?(pattern1|$)"));
The breakdown is:
- Remove everything up to pattern1 (required to not end up with an empty string as the first term)
- Split on input (non-greedy
.*?
) between pattern2 and pattern1 (or end of input) - Use the utility method
Arrays.asList()
to generate aList<String>
Here's some test code:
public static void main( String[] args ) {
String input = "abcabc pattern1foopattern2 abcdefg pattern1barpattern2 morestuff";
List<String> strings = Arrays.asList( input.replaceAll("^.*?pattern1", "").split("pattern2.*?(pattern1|$)"));
System.out.println( strings);
}
Output:
[foo, bar]
Try this:
String str = "its a string with pattern1 aleatory pattern2 things between pattern1 and pattern2 and sometimes pattern1 pattern2 nothing";
Matcher m = Pattern.compile(
Pattern.quote("pattern1")
+ "(.*?)"
+ Pattern.quote("pattern2")
).matcher(str);
while(m.find()){
String match = m.group(1);
System.out.println(">"+match+"<");
//here you insert 'match' into the list
}
It prints:
> aleatory <
> and <
> <