UTF-8 file output in R
For anyone coming upon this question later, see the stringi
package (https://cran.r-project.org/web/packages/stringi/index.html). It includes numerous functions to enable consistent, cross-platform UTF-8 string support in R. Most relevant to this thread, the stri_read_lines()
, stri_read_raw()
, and stri_write_lines()
functions can consistently input/output UTF-8, even on Windows.
The problem is due to some R-Windows special behaviour (using the default system coding / or using some system write functions; I do not know the specifics but the behaviour is actually known)
To write text UTF8 encoding on Windows one has to use the useBytes=T
options in functions like writeLines or readLines:
txt <- "在"
writeLines(txt, "test.txt", useBytes=T)
readLines("test.txt", encoding="UTF-8")
[1] "在"
Find here a really well written article by Kevin Ushey: http://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/ going into much more detail.
Saves UTF-8 strings in text file:
kLogFileName <- "parser.log"
log <- function(msg="") {
con <- file(kLogFileName, "a")
tryCatch({
cat(iconv(msg, to="UTF-8"), file=con, sep="\n")
},
finally = {
close(con)
})
}