dplyr rowwise sum and other functions like max
The problem is that the entire data frame is passed as dot despite the rowwise
. To handle this use do
which will interpret dot as meaning just the current row. One further problem is that the dot within do
will represent the row as a list so convert it appropriately.
library(dplyr)
iris %>%
slice(1:6) %>%
select(starts_with('Petal')) %>%
rowwise() %>%
do( (.) %>% as.data.frame %>% mutate(sum = sum(.)) ) %>%
ungroup
giving:
# A tibble: 6 x 3
Petal.Length Petal.Width sum
* <dbl> <dbl> <dbl>
1 1.40 0.200 1.60
2 1.40 0.200 1.60
3 1.30 0.200 1.50
4 1.50 0.200 1.70
5 1.40 0.200 1.60
6 1.70 0.400 2.10
dplyr 1.0 - added later
Since this was asked dplyr 1.0 was released and it has cur_data()
which can be used to simplify the above eliminating the need for do
. cur_data()
within a rowwise
block refers only to the current row.
iris %>%
slice(1:6) %>%
select(starts_with('Petal')) %>%
rowwise() %>%
mutate(sum = sum(cur_data())) %>%
ungroup
You can skip the use of select
if you use c_across
to select the variables you want to sum:
iris %>%
rowwise() %>%
mutate(sum = sum(c_across(starts_with("Petal"))), .keep = "used") %>%
ungroup()
Output
If you want keep all the columns in your data frame then remove the .keep
argument.
Petal.Length Petal.Width sum
<dbl> <dbl> <dbl>
1 1.4 0.2 1.6
2 1.4 0.2 1.6
3 1.3 0.2 1.5
4 1.5 0.2 1.7
5 1.4 0.2 1.6
6 1.7 0.4 2.1
7 1.4 0.3 1.7
8 1.5 0.2 1.7
9 1.4 0.2 1.6
10 1.5 0.1 1.6
# ... with 140 more rows
Similarly, with max
:
iris %>%
rowwise() %>%
mutate(max = max(c_across(starts_with("Petal"))), .keep = "used") %>%
ungroup()
In short: you are expecting the "sum" function to be aware of dplyr
data structures like a data frame grouped by row. sum
is not aware of it so it just takes the sum of the whole data.frame
.
Here is a brief explanation. This:
select(iris, starts_with('Petal')) %>% rowwise() %>% sum()
Can be rewritten without using the pipe operator as the following:
data <- select(iris, starts_with('Petal'))
data <- rowwise(data)
sum(data)
As you can see you were constructing something called a tibble
. Then the rowwise
call adds additional information on this object and specifies that it should be grouped row-wise.
However only the functions aware of this grouping like summarize
and mutate
can work like intended. Base R functions like sum
are not aware of these objects and treat them as any standard data.frame
s. And the standard approach for sum()
is to sum the entire data frame.
Using mutate
works:
select(iris, starts_with('Petal')) %>%
rowwise() %>%
mutate(sum = sum(Petal.Width, Petal.Length))
Result:
Source: local data frame [150 x 3]
Groups: <by row>
# A tibble: 150 x 3
Petal.Length Petal.Width sum
<dbl> <dbl> <dbl>
1 1.40 0.200 1.60
2 1.40 0.200 1.60
3 1.30 0.200 1.50
...