How can I find a subsequence in a &[u8] slice?
I don't think the standard library contains a function for this. Some libcs have memmem
, but at the moment the libc crate does not wrap this. You can use the twoway
crate however. rust-bio
implements some pattern matching algorithms, too. All of those should be faster than using haystack.windows(..).position(..)
I found the memmem
crate useful for this task:
use memmem::{Searcher, TwoWaySearcher};
let search = TwoWaySearcher::new("dog".as_bytes());
assert_eq!(
search.search_in("The quick brown fox jumped over the lazy dog.".as_bytes()),
Some(41)
);
How about Regex on bytes? That looks very powerful. See this Rust playground demo.
extern crate regex;
use regex::bytes::Regex;
fn main() {
//see https://doc.rust-lang.org/regex/regex/bytes/
let re = Regex::new(r"say [^,]*").unwrap();
let text = b"say foo, say bar, say baz";
// Extract all of the strings without the null terminator from each match.
// The unwrap is OK here since a match requires the `cstr` capture to match.
let cstrs: Vec<usize> =
re.captures_iter(text)
.map(|c| c.get(0).unwrap().start())
.collect();
assert_eq!(cstrs, vec![0, 9, 18]);
}
Here's a simple implementation based on the windows
iterator.
fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
haystack.windows(needle.len()).position(|window| window == needle)
}
fn main() {
assert_eq!(find_subsequence(b"qwertyuiop", b"tyu"), Some(4));
assert_eq!(find_subsequence(b"qwertyuiop", b"asd"), None);
}
The find_subsequence
function can also be made generic:
fn find_subsequence<T>(haystack: &[T], needle: &[T]) -> Option<usize>
where for<'a> &'a [T]: PartialEq
{
haystack.windows(needle.len()).position(|window| window == needle)
}