How to loop through columns, check if a particular value exists in any of the columns, mutate a new column and enter 1 if it exists, 0 if not?

this was my solution- turning the data from wide to long and back again

s <- reshape2::melt(sample, id.vars = "ID", value.name = "Q")
s$variable <- 1
s <- subset(s, complete.cases(s))
s <- reshape(s, idvar = "ID", timevar = "Q", direction = "wide")
s <- apply(s, 2, function(x) ifelse(is.na(x), 0, x))

1) Assuming DF is as shown reproducibly in the Note at the end use sapply to create an matrix of indicators and then cbind it to the ID column. Finally make the names nicer. No packages are used.

ques <- function(i) paste0("Q", 1:25) %in% unlist(DF[i, -1])
DFout <- cbind(DF[1], +t(sapply(1:nrow(DF), ques)))
names(DFout)[-1] <- paste0("Q", names(DFout[-1]))

The first 5 columns are:

> DFout[1:5]

   ID Q1 Q2 Q3 Q4
1 100  1  0  0  0
2 101  0  0  1  1
3 102  0  1  1  1
4 103  0  0  0  0
5 104  0  0  0  1
6 105  1  1  1  1

2) Another possibility is to convert the input to long form and then use xtabs to create a table from it.

library(dplyr)
library(tidyr)

tab <- DF %>% 
  gather(key, Question, -ID) %>%
  filter(nzchar(Question)) %>%
  mutate(Question = factor(Question, paste0("Q", 1:25))) %>%
  xtabs(~ ID + Question, .)

giving this table. We show the first 5 columns:

> tab[, 1:5]

     Question
ID    Q1 Q2 Q3 Q4
  100  1  0  0  0
  101  0  0  1  1
  102  0  1  1  1
  104  0  0  0  1
  105  1  1  1  1

If it is important that the result be a data frame then add:

library(tibble)

tab %>% 
  as.data.frame.matrix %>% 
  rownames_to_column(var = "ID")

Note

sample <- "rows ID   Col1  Col2  Col3  Col4
1          100   Q1     
2          101   Q3    Q4
3          102   Q2    Q3    Q4   
4          103   
5          104   Q4
6          105   Q1    Q2    Q3    Q4"
DF <- read.table(text = sample, header = TRUE, fill = TRUE, as.is = TRUE,
  strip.white = TRUE)[-1]

@G.Grothendieck provides a really nice solution. Here's a variation on that answer that will produce a value for every question in the test, even if it was answered correctly by every student. Admittedly, it's a little less elegant. Also note that I construct the data with missing values instead of empty strings, so the filter is a little different

dat <- data.frame(ID = c(100:105), 
                  Col1 = c("Q1", "Q3", "Q2", NA, "Q4", "Q1"), 
                  Col2 = c(NA, "Q4", "Q3", NA, NA, "Q2"), 
                  Col3 = c(NA, NA, "Q4", NA, NA, "Q3"), 
                  Col4 = c(NA, NA, NA, NA, NA, "Q4"), 
                  stringsAsFactors = FALSE)

dat %>%
  gather(key = col, val = wrong, -ID) %>%
  select(-col) %>%
  mutate(tmp = 1) %>%
  complete(wrong = paste0("Q", 1:25)) %>%
  filter(!is.na(wrong)) %>%
  spread(wrong, tmp, fill = 0) %>%
  select(ID, paste0("Q", 1:25)) %>%
  filter(!is.na(ID)) %>%
  data.frame

How to loop through columns, check if a particular value exists in any of the columns, mutate a new column and enter 1 if it exists, 0 if not?

Note

Tags:

R

Dataframe

Dplyr

Data Cleaning

Related

Recent Posts