Tidy data.frame with repeated column names

with tidyverse we can do :

library(tidyverse)
toy %>%
  repair_names(sep="_") %>%
  pivot_longer(-(1:3),names_to = c(".value","id"), names_sep="_") %>%
  select(-id)

#> # A tibble: 15 x 7
#>    file_path               Condition Trial.Num     A     B     C ID   
#>    <fct>                   <fct>         <int> <int> <int> <int> <fct>
#>  1 root/some.extension     Baseline          1     2     3     5 car  
#>  2 root/some.extension     Baseline          1     2     1     7 bike 
#>  3 root/some.extension     Baseline          1     4     9     0 plane
#>  4 root/thing.extension    Baseline          2     3     6    45 car  
#>  5 root/thing.extension    Baseline          2     5     4     4 bike 
#>  6 root/thing.extension    Baseline          2     9     5     4 plane
#>  7 root/else.extension     Baseline          3     4     4     6 car  
#>  8 root/else.extension     Baseline          3     7     5     4 bike 
#>  9 root/else.extension     Baseline          3    68     7    56 plane
#> 10 root/uniquely.extension Treatment         1     5     3     7 car  
#> 11 root/uniquely.extension Treatment         1     1     7    37 bike 
#> 12 root/uniquely.extension Treatment         1     9     8     7 plane
#> 13 root/defined.extension  Treatment         2     6     7     3 car  
#> 14 root/defined.extension  Treatment         2     4     6     8 bike 
#> 15 root/defined.extension  Treatment         2     9     0     8 plane
#> Warning message:
#> Expected 2 pieces. Missing pieces filled with `NA` in 4 rows [1, 2, 3, 4].

You can use the make.unique-function to create unique column names. After that you can use melt from the data.table-package which is able to create multiple value-columns based on patterns in the columnnames:

# make the column names unique
names(toy) <- make.unique(names(toy))
# let the 'Condition' column start with a small letter 'c'
# so it won't be detected by the patterns argument from melt
names(toy)[2] <- tolower(names(toy)[2])

# load the 'data.table' package
library(data.table)
# tidy the data into long format
tidy_toy <- melt(setDT(toy), 
                 measure.vars = patterns('^A','^B','^C','^ID'), 
                 value.name = c('A','B','C','ID'))

which gives:

 > tidy_toy
                  file_path condition Trial.Num variable  A B  C    ID
 1:     root/some.extension  Baseline         1        1  2 3  5   car
 2:    root/thing.extension  Baseline         2        1  3 6 45   car
 3:     root/else.extension  Baseline         3        1  4 4  6   car
 4: root/uniquely.extension Treatment         1        1  5 3  7   car
 5:  root/defined.extension Treatment         2        1  6 7  3   car
 6:     root/some.extension  Baseline         1        2  2 1  7  bike
 7:    root/thing.extension  Baseline         2        2  5 4  4  bike
 8:     root/else.extension  Baseline         3        2  7 5  4  bike
 9: root/uniquely.extension Treatment         1        2  1 7 37  bike
10:  root/defined.extension Treatment         2        2  4 6  8  bike
11:     root/some.extension  Baseline         1        3  4 9  0 plane
12:    root/thing.extension  Baseline         2        3  9 5  4 plane
13:     root/else.extension  Baseline         3        3 68 7 56 plane
14: root/uniquely.extension Treatment         1        3  9 8  7 plane
15:  root/defined.extension Treatment         2        3  9 0  8 plane

Another option is to use a list of column-indexes for measure.vars:

tidy_toy <- melt(setDT(toy), 
                 measure.vars = list(c(4,8,12), c(5,9,13), c(6,10,14), c(7,11,15)), 
                 value.name = c('A','B','C','ID'))

Making the column-names unique isn't necessary then.

A more complicated method that creates names that are better distinguishable by the patterns argument:

# select the names that are not unique
tt <- table(names(toy))
idx <- which(names(toy) %in% names(tt)[tt > 1])
nms <- names(toy)[idx]

# make them unique
names(toy)[idx] <- paste(nms, 
                         rep(seq(length(nms) / length(names(tt)[tt > 1])), 
                             each = length(names(tt)[tt > 1])), 
                         sep = '.')

# your columnnames are now unique:
> names(toy)
 [1] "file_path" "Condition" "Trial.Num" "A.1"       "B.1"       "C.1"       "ID.1"      "A.2"      
 [9] "B.2"       "C.2"       "ID.2"      "A.3"       "B.3"       "C.3"       "ID.3"     

# tidy the data into long format
tidy_toy <- melt(setDT(toy), 
                 measure.vars = patterns('^A.\\d','^B.\\d','^C.\\d','^ID.\\d'), 
                 value.name = c('A','B','C','ID'))

which will give the same end-result.

As mentioned in the comments, the janitor-package can be helpful for this problem as well. The clean_names() works similar as the make.unique function. See here for an explanation.

Tidy data.frame with repeated column names

Tags:

R

Dataframe

Reshape

Reshape2

Tidyr

Related

Recent Posts