How to handle example data in R Package that has UTF-8 marked strings

In case it's useful to anyone in the future, the resolution I found is this:

The UTF-8 marked characters were in the dataset because Twitter tweets sometimes include emoji's.

The advice I was given is that there isn't a straightforward way to get rid of the NOTE in the PACKAGE CMD CHECK without just removing all of the UTF-8 marked strings.

To do this, I used the command:

nonUTF <- iconv(df$TroubleVector, from="UTF-8", to="ASCII")

on the vector that had emoji's, etc. This command returned NA if the value had UTF-8 marked strings. I used this to subset the dataset - now I get a clean build.

Tags:

Utf 8

R

Twitter