Invalid multibyte string in read.csv
The readr package from the tidyverse universe might help.
You can set the encoding via the local argument of the read_csv()
function by using the local()
function and its encoding argument:
read_csv(file = "http://www.mof.go.jp/international_policy/reference/itn_transactions_in_securities/week.csv",
skip = 14,
local = locale(encoding = "latin1"))
Encoding
sets the encoding of a character string. It doesn't set the encoding of the file represented by the character string, which is what you want.
This worked for me, after trying "UTF-8"
:
x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE, fileEncoding="latin1")
And you may want to skip the first 16 lines, and read in the headers separately. Either way, there's still quite a bit of cleaning up to do.
x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE,
fileEncoding="latin1", skip=16)
# get started with the clean-up
x[,1] <- gsub("\u0081|`", "", x[,1]) # get rid of odd characters
x[,-1] <- as.data.frame(lapply(x[,-1], # convert to numbers
function(d) type.convert(gsub(d, pattern=",", replace=""))))
You may have encountered this issue because of the incompatibility of system locale
try setting the system locale with this code Sys.setlocale("LC_ALL", "C")