R: apply-like function that returns a data frame?
You can use the apply family but, you're right, the result is either a matrix
or a list
. Not a big deal though to get back to a data.frame
.
Your function needs to return something consistent across columns (raw iris
instead of iris[, 1:4]
would not work below, because of iris$Species
which is a factor with 3 levels where summary
returns 6 numeric from a numeric column) and that's where a reproducible would help. Below, I used iris
and summary
:
- apply:
as.data.frame(apply(iris[, 1:4], 2, summary))
- sapply:
as.data.frame(sapply(iris[, 1:4], summary))
- lapply:
do.call(cbind, lapply(iris[, 1:4], summary))
I have just implemented this function, which applies FUN
over rows as lists and concatenates the result to a tibble
:
library(magrittr)
lapply_rows <- function(df, return_tibble = TRUE, FUN, ...) {
df_rownames <- rownames(df)
res <- lapply(purrr::transpose(df), FUN = FUN, ...) %>%
purrr::map_depth(2, function(x) {
if (length(x) != 1) {
return(list(x))
} else {
return(x)
}
}) %>%
dplyr::bind_rows()
if (!return_tibble) {
res <- as.data.frame(res)
rownames(res) <- df_rownames
}
return(res)
}
df
is converted to a list of lists by purrr::transpose(df)
, where each sublist is one row of the original df
. FUN
must return a named list, which can also contain elements with a length other than one. These elements are then wrapped in list()
(type of a column of a data.frame
-like object could be also a list). If return_tibble
is FALSE
, result is coerced to data.frame
and original rownames are set.
Example:
df <- lapply_rows(mtcars, FUN = function(row_list) {
row_list$cyl_2 <- row_list$cyl ** 2
row_list$colors <- c("red", "green", "blue")
row_list$sublist <- mtcars[1:5, 1:5]
return(row_list)
})
head(df)
# A tibble: 6 x 14
mpg cyl disp hp drat wt qsec vs am gear carb cyl_2 colors sublist
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 36 <chr [3]> <df[,5] [5 × 5]>
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 36 <chr [3]> <df[,5] [5 × 5]>
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 16 <chr [3]> <df[,5] [5 × 5]>
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 36 <chr [3]> <df[,5] [5 × 5]>
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 64 <chr [3]> <df[,5] [5 × 5]>
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 36 <chr [3]> <df[,5] [5 × 5]>
Example returning a data.frame
:
df2 <- lapply_rows(mtcars, return_tibble = FALSE, FUN = function(row_list) {
row_list$cyl_2 <- row_list$cyl ** 2
row_list$colors <- c("red", "green", "blue")
row_list$sublist <- mtcars[1:5, 1:5]
return(row_list)
})
head(df2)
mpg cyl disp hp drat wt qsec vs am gear carb cyl_2 colors
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 36 red, green, blue
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 36 red, green, blue
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 16 red, green, blue
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 36 red, green, blue
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 64 red, green, blue
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 36 red, green, blue
sublist
Mazda RX4 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Mazda RX4 Wag 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Datsun 710 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Hornet 4 Drive 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Hornet Sportabout 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
Valiant 21.00, 21.00, 22.80, 21.40, 18.70, 6.00, 6.00, 4.00, 6.00, 8.00, 160.00, 160.00, 108.00, 258.00, 360.00, 110.00, 110.00, 93.00, 110.00, 175.00, 3.90, 3.90, 3.85, 3.08, 3.15
(you can see that tibble
is handling the <list>
columns much better)