How to deal with spaces in column names?
You asked "Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names" and yes there are a few:
- Just don't use them as things will break as you experienced here
- Use the
make.names()
function to create safe names; this is used by R too to create identifiers (eg by using underscores for spaces etc) - If you must, protect the unsafe identifiers with backticks.
Example for the last two points:
R> myvec <- list("foo"=3.14, "some bar"=2.22)
R> myvec$'some bar' * 2
[1] 4.44
R> make.names(names(myvec))
[1] "foo" "some.bar"
R>
This is a "bug" in the package ggplot2
that comes from the fact that the function as.data.frame()
in the internal ggplot2 function quoted_df
converts the names to syntactically valid names. These syntactically valid names cannot be found in the original dataframe, hence the error.
To remind you :
syntactically valid names consists of letters, numbers and the dot or underline characters, and start with a letter or the dot (but the dot cannot be followed by a number)
There's a reason for that. There's also a reason why ggplot allows you to set labels using labs
, eg using the following dummy dataset with valid names:
X <-data.frame(
PonOAC = rep(c('a','b','c','d'),2),
AgeGroup = rep(c("over 80",'under 80'),each=4),
NumberofPractices = rpois(8,70)
)
You can use labs at the end to make this code work
ggplot(X, aes(x=PonOAC,y=NumberofPractices, fill=AgeGroup)) +
geom_bar() +
facet_grid(AgeGroup~ .) +
labs(x="% on OAC", y="Number of Practices",fill = "Age Group")
To produce