How can I case fold a string in Rust?

For my use case, I've found the caseless crate to be most useful.

As far as I know, this is the only library which supports normalization. This is important when you want e.g. "㎒" (U+3392 SQUARE MHZ) and "mhz" to match. See Chapter 3 - Default Caseless Matching in the Unicode Standard for details on how this works.

Here's some example code that matches a string case-insensitively:

extern crate caseless;
use caseless::Caseless;

let a = "100 ㎒";
let b = "100 mhz";

// These strings don't match with just case folding,
// but do match after compatibility (NFKD) normalization
assert!(!caseless::default_caseless_match_str(a, b));
assert!(caseless::compatibility_caseless_match_str(a, b));

To get the case folded string directly, you can use the default_case_fold_str function:

let s = "Twilight Sparkle ちゃん";
assert_eq!(caseless::default_case_fold_str(s), "twilight sparkle ちゃん");

Caseless doesn't expose a corresponding function that normalizes as well, but you can write one using the unicode-normalization crate:

extern crate unicode_normalization;
use caseless::Caseless;
use unicode_normalization::UnicodeNormalization;

fn compatibility_case_fold(s: &str) -> String {
    s.nfd().default_case_fold().nfkd().default_case_fold().nfkd().collect()
}

let a = "100 ㎒";
assert_eq!(compatibility_case_fold(a), "100 mhz");

Note that multiple rounds of normalization and case folding are needed for a correct result.

(Thanks to BurntSushi5 for pointing me to this library.)

Tags:

Unicode

Rust