Move NAs to the end of each column in a data frame
Another solution using lapply
(without sorting/reordering the data- per your comments)
df[] <- lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
df
# a b d
# 1 1 57 5
# 2 5 2 7
# 3 34 7 2
# 4 7 9 8
# 5 3 5 2
# 6 5 12 5
# 7 8 100 NA
# 8 4 NA NA
# 9 NA NA NA
# 10 NA NA NA
Or using data.table
in order to update df
by reference, rather than creating a copy of it (that solution won't sort your data neither)
library(data.table)
setDT(df)[, names(df) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
df
# a b d
# 1: 1 57 5
# 2: 5 2 7
# 3: 34 7 2
# 4: 7 9 8
# 5: 3 5 2
# 6: 5 12 5
# 7: 8 100 NA
# 8: 4 NA NA
# 9: NA NA NA
# 10: NA NA NA
Some benchmarks reveal the base solution is the fastest by far:
library("microbenchmark")
david <- function() lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
dt <- setDT(df)
david.dt <- function() dt[, names(dt) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
microbenchmark(as.data.frame(lapply(df, beetroot)), david(), david.dt())
# Unit: microseconds
# expr min lq median uq max neval
# as.data.frame(lapply(df, beetroot)) 1145.224 1215.253 1274.417 1334.7870 4028.507 100
# david() 116.515 127.382 140.965 149.7185 308.493 100
# david.dt() 3087.335 3247.920 3330.627 3415.1460 6464.447 100
After completely misunderstanding the question, here is my final answer:
# named after beetroot for being the first to ever need this functionality
beetroot <- function(x) {
# count NA
num.na <- sum(is.na(x))
# remove NA
x <- x[!is.na(x)]
# glue the number of NAs at the end
x <- c(x, rep(NA, num.na))
return(x)
}
# apply beetroot over each column in the dataframe
as.data.frame(lapply(df, beetroot))
It will count the NAs, remove the NAs, and glue NAs at the bottom for each column in the data frame.
For fun, you can also make use of length<-
and na.omit
.
Here's what that combination would do:
x <- c(NA, 1, 2, 3)
x
# [1] NA 1 2 3
`length<-`(na.omit(x), length(x))
# [1] 1 2 3 NA
Applied to your problem, the solution would be:
df[] <- lapply(df, function(x) `length<-`(na.omit(x), nrow(df)))
df
# a b d
# 1 1 57 5
# 2 5 2 7
# 3 34 7 2
# 4 7 9 8
# 5 3 5 2
# 6 5 12 5
# 7 8 100 NA
# 8 4 NA NA
# 9 NA NA NA
# 10 NA NA NA