How to loop through columns, check if a particular value exists in any of the columns, mutate a new column and enter 1 if it exists, 0 if not?
this was my solution- turning the data from wide to long and back again
s <- reshape2::melt(sample, id.vars = "ID", value.name = "Q")
s$variable <- 1
s <- subset(s, complete.cases(s))
s <- reshape(s, idvar = "ID", timevar = "Q", direction = "wide")
s <- apply(s, 2, function(x) ifelse(is.na(x), 0, x))
1) Assuming DF
is as shown reproducibly in the Note at the end use sapply
to create an matrix of indicators and then cbind
it to the ID
column. Finally make the names nicer. No packages are used.
ques <- function(i) paste0("Q", 1:25) %in% unlist(DF[i, -1])
DFout <- cbind(DF[1], +t(sapply(1:nrow(DF), ques)))
names(DFout)[-1] <- paste0("Q", names(DFout[-1]))
The first 5 columns are:
> DFout[1:5]
ID Q1 Q2 Q3 Q4
1 100 1 0 0 0
2 101 0 0 1 1
3 102 0 1 1 1
4 103 0 0 0 0
5 104 0 0 0 1
6 105 1 1 1 1
2) Another possibility is to convert the input to long form and then use xtabs
to create a table from it.
library(dplyr)
library(tidyr)
tab <- DF %>%
gather(key, Question, -ID) %>%
filter(nzchar(Question)) %>%
mutate(Question = factor(Question, paste0("Q", 1:25))) %>%
xtabs(~ ID + Question, .)
giving this table. We show the first 5 columns:
> tab[, 1:5]
Question
ID Q1 Q2 Q3 Q4
100 1 0 0 0
101 0 0 1 1
102 0 1 1 1
104 0 0 0 1
105 1 1 1 1
If it is important that the result be a data frame then add:
library(tibble)
tab %>%
as.data.frame.matrix %>%
rownames_to_column(var = "ID")
Note
sample <- "rows ID Col1 Col2 Col3 Col4
1 100 Q1
2 101 Q3 Q4
3 102 Q2 Q3 Q4
4 103
5 104 Q4
6 105 Q1 Q2 Q3 Q4"
DF <- read.table(text = sample, header = TRUE, fill = TRUE, as.is = TRUE,
strip.white = TRUE)[-1]
@G.Grothendieck provides a really nice solution. Here's a variation on that answer that will produce a value for every question in the test, even if it was answered correctly by every student. Admittedly, it's a little less elegant. Also note that I construct the data with missing values instead of empty strings, so the filter is a little different
dat <- data.frame(ID = c(100:105),
Col1 = c("Q1", "Q3", "Q2", NA, "Q4", "Q1"),
Col2 = c(NA, "Q4", "Q3", NA, NA, "Q2"),
Col3 = c(NA, NA, "Q4", NA, NA, "Q3"),
Col4 = c(NA, NA, NA, NA, NA, "Q4"),
stringsAsFactors = FALSE)
dat %>%
gather(key = col, val = wrong, -ID) %>%
select(-col) %>%
mutate(tmp = 1) %>%
complete(wrong = paste0("Q", 1:25)) %>%
filter(!is.na(wrong)) %>%
spread(wrong, tmp, fill = 0) %>%
select(ID, paste0("Q", 1:25)) %>%
filter(!is.na(ID)) %>%
data.frame