C++ - Split string by regex
#include <regex>
std::regex rgx("\\s+");
std::sregex_token_iterator iter(string_to_split.begin(),
string_to_split.end(),
rgx,
-1);
std::sregex_token_iterator end;
for ( ; iter != end; ++iter)
std::cout << *iter << '\n';
The -1
is the key here: when the iterator is constructed the iterator points at the text that precedes the match and after each increment the iterator points at the text that followed the previous match.
If you don't have C++11, the same thing should work with TR1 or (possibly with slight modification) with Boost.
To expand on the answer by @Pete Becker I provide an example of resplit function that can be used to split text using regexp:
#include <regex>
std::vector<std::string> resplit(const std::string &s, const std::regex &sep_regex = std::regex{"\\s+"}) {
std::sregex_token_iterator iter(s.begin(), s.end(), sep_regex, -1);
std::sregex_token_iterator end;
return {iter, end};
}
This works as follows:
string s1 = "first second third ";
vector<string> v22 = resplit(s1);
for (const auto & e: v22) {
cout <<"Token:" << e << endl;
}
//Token:first
//Token:second
//Token:third
string s222 = "first|second:third,forth";
vector<string> v222 = resplit(s222, "[|:,]");
for (const auto & e: v222) {
cout <<"Token:" << e << endl;
}
//Token:first
//Token:second
//Token:third
//Token:forth
You don't need to use regular expressions if you just want to split a string by multiple spaces. Writing your own regex library is overkill for something that simple.
The answer you linked to in your comments, Split a string in C++?, can easily be changed so that it doesn't include any empty elements if there are multiple spaces.
std::vector<std::string> &split(const std::string &s, char delim,std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
if (item.length() > 0) {
elems.push_back(item);
}
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}
By checking that item.length() > 0
before pushing item
on to the elems
vector you will no longer get extra elements if your input contains multiple delimiters (spaces in your case)