Difference between char and character objects
The two R types char
and character
at the internal C side correspond to CHARSXP
and STRSXP
respectively. At the R level, one always deals with character
objects; a single string, like:
y <- "My name is hasnain"
is actually a character
object of length 1. Internally, each element of a character
is a char
, but R doesn't provide (AFAIK) a direct way to extract, create and/or use a char
.
Although you can't create a char
/CHARSXP
object with pure R, it's straightforward to get it through the R/C interface using the mkChar
function, which takes a standard C string and turns it into a CHARSXP
. For instance, one can create a char.c
file:
#include <stdio.h>
#include <stdlib.h>
#include <R.h>
#include <Rinternals.h>
SEXP returnCHAR() {
SEXP ret = PROTECT(mkChar("Hello World!"));
UNPROTECT(1);
return ret;
}
After compiling it through R CMD SHLIB char.c
, from the R side:
dyn.load("char.so") #linux dll; extension varies across platforms
x<-.Call("returnCHAR")
x
# <CHARSXP: "Hello World!">
typeof(x)
#[1] "char"
length(x)
#[1] 12
Besides typeof
and length
I didn't find many other R functions that acts on char
objects. Even as.character
doesn't work! I could neither extract a char
from a standard character
vector, nor insert this char
into an existing character
vector (assignment doesn't work).
The c
function coerces to a list
if an object is a char
:
c(1,"a",x)
#[[1]]
#[1] 1
#
#[[2]]
#[1] "a"
#
#[[3]]
#<CHARSXP: "Hello World!">
We can make use of .Internal(inspect())
(warning: inspect
is an internal, not exposed function and so it might change in future releases. Don't rely on it) to have a glimpse of the internal structure of an object. As far as I know, char
/CHARXSP
objects are shared between string vectors to save memory. For instance:
let<-letters[1:2]
.Internal(inspect(let))
#@1aff2a8 16 STRSXP g0c2 [NAM(1)] (len=2, tl=0)
# @1368c60 09 CHARSXP g0c1 [MARK,gp=0x61] [ASCII] [cached] "a"
# @16dc7c0 09 CHARSXP g0c1 [MARK,gp=0x60] [ASCII] [cached] "b"
mya<-"a"
.Internal(inspect(mya))
#@3068710 16 STRSXP g0c1 [NAM(3)] (len=1, tl=0)
# @1368c60 09 CHARSXP g0c1 [MARK,gp=0x61] [ASCII] [cached] "a"
From the above output, we note two things:
STRSXP
objects are vectors ofCHARSXP
objects, as we mentioned;- strings are stored in a "global pool": the
"a"
string is stored at the same address despite being created independently in two different objects.