Can I read Chinese characters with ReadList correctly?

Following the comments above, I think I've managed to find the answer, that is, as m_goldberg and librik said, ReadList doesn't support character encoding, and maybe that's one of the reasons it's fast.

However, that doesn't mean we can't make use of ReadList. In fact, following the advice from mfvonh, I found that Import internally uses ReadList to read a.txt first and then converts it to the right encoding with ToCharacterCode and FromCharacterCode after a lot of judgments that I don't understand very well and seem to be redundant. So why not omit those judgments?:

Export["a.txt", "这乱码问题该怎么解决呢\n***\n1234\n这样解决呀"];
FromCharacterCode[ToCharacterCode[ReadList["a.txt", Record](*,"ISOLatin1"*)], 
                  "UTF8"] // AbsoluteTiming
Import["a.txt"]; // AbsoluteTiming
{0.0010000, {"这乱码问题该怎么解决呢", "***", "1234", "这样解决呀"}}
{0.0440000, Null}

Not sure if this will fail in more complicated cases.


I think the most straightforward method is to read the data as a byte content, then interpret that as an UTF8 text. This is what it would look like:

FromCharacterCode[BinaryReadList["a.txt"], "UTF8"]

It will be slightly more performant than the other suggestions as it avoids any unnecessary conversions. Be aware that you need to break into lines via e.g. StringSplit if so desired...