Get first character of UTF-8 string
PHP strings doesn't understand multibyte strings by default, the array like indexing will chop of the first byte and if that happen not to be in the ascii range you get this result.
Use mb_substr method.
As previously mentioned in other questions, with PHP, when attempting to get a substring, it doesn't understand multibyte characters (as you get with UTF8 for example).
What the other answers don't mention is that you should hint the encoding you would like to use for the mb_substr
So, for example, I use this:
mb_substr( "Sunday", 0, 1,'UTF8'); // Returns S
mb_substr( "воскресенье", 0, 1,'UTF8'); // Returns в
There are several things you need to consider:
- Check that data in the DB is being stored as UTF-8
- Check that the client connection to the DB is in UTF-8 (for example, in mysql see: http://www.php.net/manual/en/mysqli.character-set-name.php)
- Make sure that the page has it's content-type set as UTF-8 [you can use
header('Content-Type: utf-8');
] - Try setting the internal encoding, using
mb_internal_encoding("UTF-8");
- Use
mb_substr
instead of array index notation
$first_char = mb_substr($title, 0, 1);
You need to use PHP's multibyte string functions to properly handle Unicode strings:
http://www.php.net/manual/en/ref.mbstring.php
http://www.php.net/manual/en/function.mb-substr.php
You'll also need to specify the character encoding in the <head>
of your HTML:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
or:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-16" />