Is it safe to use `strstr` to search for multibyte UTF-8 characters in a string?

Edit
Based on updated question from OP that "can such false positive exist in an UTF-8 context" So the answer is UTF-8 is designed in such a way that it is immune to partial mismatch of character as shown above and cause any false positive. So it is completely safe to use strstr with UTF-8 coded multibyte characters.

Original Answer
No strstr is not suitable for strings containing multi-byte characters.

If you are searching for a string that doesn't contain multi-byte character inside a string that contains multi-byte character, it may give false positive. (While using shift-jis encoding in japanese locale, strstr("掘something", "@some") may give false positive)

+---------+----+----+----+
|   c1    | c2 | c3 | c4 |  <--- string
+---------+----+----+----+

     +----+----+----+
     | c5 | c2 | c3 |  <--- string to search
     +----+----+----+

If trailing part of c1 (accidentally) matches with c5, you may get incorrect result. I would suggest using unicode with unicode substring check function or multibyte substring check functions. (_mbsstr for example)

Is it safe to use `strstr` to search for multibyte UTF-8 characters in a string?

Tags:

C

String

Utf 8

Multibyte Functions

Related

Recent Posts