Move NAs to the end of each column in a data frame

Another solution using lapply (without sorting/reordering the data- per your comments)

df[] <- lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
df
#     a   b  d
# 1   1  57  5
# 2   5   2  7
# 3  34   7  2
# 4   7   9  8
# 5   3   5  2
# 6   5  12  5
# 7   8 100 NA
# 8   4  NA NA
# 9  NA  NA NA
# 10 NA  NA NA

Or using data.table in order to update df by reference, rather than creating a copy of it (that solution won't sort your data neither)

library(data.table)
setDT(df)[, names(df) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
df
#      a   b  d
#  1:  1  57  5
#  2:  5   2  7
#  3: 34   7  2
#  4:  7   9  8
#  5:  3   5  2
#  6:  5  12  5
#  7:  8 100 NA
#  8:  4  NA NA
#  9: NA  NA NA
# 10: NA  NA NA

Some benchmarks reveal the base solution is the fastest by far:

library("microbenchmark")
david <- function() lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
dt <- setDT(df)
david.dt <- function() dt[, names(dt) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]

microbenchmark(as.data.frame(lapply(df, beetroot)), david(), david.dt())
# Unit: microseconds
#                                 expr      min       lq   median        uq      max neval
#  as.data.frame(lapply(df, beetroot)) 1145.224 1215.253 1274.417 1334.7870 4028.507   100
#                              david()  116.515  127.382  140.965  149.7185  308.493   100
#                           david.dt() 3087.335 3247.920 3330.627 3415.1460 6464.447   100

After completely misunderstanding the question, here is my final answer:

# named after beetroot for being the first to ever need this functionality
beetroot <- function(x) {
    # count NA
    num.na <- sum(is.na(x))
    # remove NA
    x <- x[!is.na(x)]
    # glue the number of NAs at the end
    x <- c(x, rep(NA, num.na))
    return(x)
}

# apply beetroot over each column in the dataframe
as.data.frame(lapply(df, beetroot))

It will count the NAs, remove the NAs, and glue NAs at the bottom for each column in the data frame.

For fun, you can also make use of length<- and na.omit.

Here's what that combination would do:

x <- c(NA, 1, 2, 3)
x
# [1] NA  1  2  3
`length<-`(na.omit(x), length(x))
# [1]  1  2  3 NA

Applied to your problem, the solution would be:

df[] <- lapply(df, function(x) `length<-`(na.omit(x), nrow(df)))
df
#     a   b  d
# 1   1  57  5
# 2   5   2  7
# 3  34   7  2
# 4   7   9  8
# 5   3   5  2
# 6   5  12  5
# 7   8 100 NA
# 8   4  NA NA
# 9  NA  NA NA
# 10 NA  NA NA

Move NAs to the end of each column in a data frame

Tags:

Sorting

R

Dataframe

Na

Related

Recent Posts