Displaying UTF-8 encoded Chinese characters in R

Not a bug, more a misunderstanding of the underlying type system conversions (the character type and the factor type) when constructing a data.frame.

You could start first with data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE) which will make your Chinese characters to be of the character type and so by printing them out you should see waht you are expecting.

@nograpes: similarly x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE) and everything should be ok.


In my case, the utf-8 encoding does not work in my r. But the Gb* encoding works.The utf8 wroks in ubuntu. First you need to figure out the default encoding in your OS. And encode it as it is. Excel can not encode it as utf8 properly even it claims that it save as utf8.

(1) Download 'Open Sheet' software.

(2) Open it properly. You can scroll the encoding method until you see the Chinese character displayed in the preview windows.

(3) Save it as utf-8(if you want utf-8). (UTF-8 is not solution to every problem, you HAVE TO know the default encoding in your system first)

Tags:

Utf 8

Locale

R