Detect file encoding in PHP

This is my solution which worked like a charm:

//check string strict for encoding out of list of supported encodings
$enc = mb_detect_encoding($str, mb_list_encodings(), true);

if ($enc===false){
    //could not detect encoding
}
else if ($enc!=="UTF-8"){
    $str = mb_convert_encoding($str, "UTF-8", $enc);
}
else {
    //UTF-8 detected
}

Try using the mb_detect_encoding function. This function will examine your string and attempt to "guess" what its encoding is. You can then convert it as desired. As brulak suggested, however, you're probably better off converting to UTF-8 rather than from, to preserve the data you're transmitting.


To make sure that the output is UTF-8, no matter what kind of input it was, I use this check:

if(!mb_check_encoding($output, 'UTF-8')
    OR !($output === mb_convert_encoding(mb_convert_encoding($output, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) {

    $output = mb_convert_encoding($content, 'UTF-8', 'pass'); 
}

// $output is now safely converted to UTF-8!

mb_detect_encoding function should be your last choice. That could return the wrong encoding. Linux command file -i /path/myfile.txt is working great. In PHP you could use:

function _detectFileEncoding($filepath) {
    // VALIDATE $filepath !!!
    $output = array();
    exec('file -i ' . $filepath, $output);
    if (isset($output[0])){
        $ex = explode('charset=', $output[0]);
        return isset($ex[1]) ? $ex[1] : null;
    }
    return null;
}