Function for median similar to "which.max" and "which.min" / Extracting median rows from a data.frame
While Sacha's solution is quite general, the median (or other quantiles) are order statistics, so you can calculate the corresponding indices from order (x)
(instead of sort (x)
for the quantile values).
Looking into quantile
, types 1 or 3 could be used, all others lead to (weighted) averages of two values in certain cases.
I chose type 3, and a bit of copy & paste from quantile
leads to:
which.quantile <- function (x, probs, na.rm = FALSE){
if (! na.rm & any (is.na (x)))
return (rep (NA_integer_, length (probs)))
o <- order (x)
n <- sum (! is.na (x))
o <- o [seq_len (n)]
nppm <- n * probs - 0.5
j <- floor(nppm)
h <- ifelse((nppm == j) & ((j%%2L) == 0L), 0, 1)
j <- j + h
j [j == 0] <- 1
o[j]
}
A little test:
> x <-c (2.34, 5.83, NA, 9.34, 8.53, 6.42, NA, 8.07, NA, 0.77)
> probs <- c (0, .23, .5, .6, 1)
> which.quantile (x, probs, na.rm = TRUE)
[1] 10 1 6 6 4
> x [which.quantile (x, probs, na.rm = TRUE)] == quantile (x, probs, na.rm = TRUE, type = 3)
0% 23% 50% 60% 100%
TRUE TRUE TRUE TRUE TRUE
Here's your example:
> dat [which.quantile (dat$V4, c (0, .5, 1)),]
V1 V2 V3 V4
7 7 0.4874291 -0.01619026 1
2 2 0.1836433 0.38984324 13
1 1 -0.6264538 1.51178117 17
I think just:
which(dat$V4 == median(dat$V4))
But be careful there since the median takes the mean of two numbers if there isn't a single middle number. E.g. median(1:4)
gives 2.5 which doesn't match any of the elements.
Edit
Here is a function which will give you either the element of the median or the first match to the mean of the median, similar to how which.min()
gives you the first element that is equal to the minimum only:
whichmedian <- function(x) which.min(abs(x - median(x)))
For example:
> whichmedian(1:4)
[1] 2
I've written a more comprehensive function that serves my needs:
row.extractor = function(data, extract.by, what) {
# data = your data.frame
# extract.by = the variable that you are extracting by, either
# as its index number or by name
# what = either "min", "max", "median", or "all", with quotes
if (is.numeric(extract.by) == 1) {
extract.by = extract.by
} else if (is.numeric(extract.by) != 0) {
extract.by = which(colnames(dat) %in% "extract.by")
}
which.median = function(data, extract.by) {
a = data[, extract.by]
if (length(a) %% 2 != 0) {
which(a == median(a))
} else if (length(a) %% 2 == 0) {
b = sort(a)[c(length(a)/2, length(a)/2+1)]
c(max(which(a == b[1])), min(which(a == b[2])))
}
}
X1 = data[which(data[extract.by] == min(data[extract.by])), ]
X2 = data[which(data[extract.by] == max(data[extract.by])), ]
X3 = data[which.median(data, extract.by), ]
if (what == "min") {
X1
} else if (what == "max") {
X2
} else if (what == "median") {
X3
} else if (what == "all") {
rbind(X1, X3, X2)
}
}
Some example usage:
> row.extractor(dat, "V4", "max")
V1 V2 V3 V4
1 1 -0.6264538 1.511781 17
> row.extractor(dat, 4, "min")
V1 V2 V3 V4
7 7 0.4874291 -0.01619026 1
> row.extractor(dat, "V4", "all")
V1 V2 V3 V4
7 7 0.4874291 -0.01619026 1
2 2 0.1836433 0.38984324 13
10 10 -0.3053884 0.59390132 14
4 1 -0.6264538 1.51178117 17