rm on a directory with millions of files

Solution 1:

Whilst a major cause of this problem is ext3 performance with millions of files, the actual root cause of this problem is different.

When a directory needs to be listed readdir() is called on the directory which yields a list of files. readdir is a posix call, but the real Linux system call being used here is called 'getdents'. Getdents list directory entries by filling a buffer with entries.

The problem is mainly down to the fact that that readdir() uses a fixed buffer size of 32Kb to fetch files. As a directory gets larger and larger (the size increases as files are added) ext3 gets slower and slower to fetch entries and additional readdir's 32Kb buffer size is only sufficient to include a fraction of the entries in the directory. This causes readdir to loop over and over and invoke the expensive system call over and over.

For example, on a test directory I created with over 2.6 million files inside, running "ls -1|wc-l" shows a large strace output of many getdent system calls.

$ strace ls -1 | wc -l
brk(0x4949000)                          = 0x4949000
getdents(3, /* 1025 entries */, 32768)  = 32752
getdents(3, /* 1024 entries */, 32768)  = 32752
getdents(3, /* 1025 entries */, 32768)  = 32760
getdents(3, /* 1025 entries */, 32768)  = 32768
brk(0)                                  = 0x4949000
brk(0x496a000)                          = 0x496a000
getdents(3, /* 1024 entries */, 32768)  = 32752
getdents(3, /* 1026 entries */, 32768)  = 32760
...

Additionally the time spent in this directory was significant.

$ time ls -1 | wc -l
2616044

real    0m20.609s
user    0m16.241s
sys 0m3.639s

The method to make this a more efficient process is to call getdents manually with a much larger buffer. This improves performance significantly.

Now, you're not supposed to call getdents yourself manually so no interface exists to use it normally (check the man page for getdents to see!), however you can call it manually and make your system call invocation way more efficient.

This drastically reduces the time it takes to fetch these files. I wrote a program that does this.

/* I can be compiled with the command "gcc -o dentls dentls.c" */

#define _GNU_SOURCE

#include <dirent.h>     /* Defines DT_* constants */
#include <err.h>
#include <fcntl.h>
#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>

struct linux_dirent {
        long           d_ino;
        off_t          d_off;
        unsigned short d_reclen;
        char           d_name[256];
        char           d_type;
};

static int delete = 0;
char *path = NULL;

static void parse_config(
        int argc,
        char **argv)
{
    int option_idx = 0;
    static struct option loptions[] = {
      { "delete", no_argument, &delete, 1 },
      { "help", no_argument, NULL, 'h' },
      { 0, 0, 0, 0 }
    };

    while (1) {
        int c = getopt_long(argc, argv, "h", loptions, &option_idx);
        if (c < 0)
            break;

        switch(c) {
          case 0: {
              break;
          }

          case 'h': {
              printf("Usage: %s [--delete] DIRECTORY\n"
                     "List/Delete files in DIRECTORY.\n"
                     "Example %s --delete /var/spool/postfix/deferred\n",
                     argv[0], argv[0]);
              exit(0);                      
              break;
          }

          default:
          break;
        }
    }

    if (optind >= argc)
      errx(EXIT_FAILURE, "Must supply a valid directory\n");

    path = argv[optind];
}

int main(
    int argc,
    char** argv)
{

    parse_config(argc, argv);

    int totalfiles = 0;
    int dirfd = -1;
    int offset = 0;
    int bufcount = 0;
    void *buffer = NULL;
    char *d_type;
    struct linux_dirent *dent = NULL;
    struct stat dstat;

    /* Standard sanity checking stuff */
    if (access(path, R_OK) < 0) 
        err(EXIT_FAILURE, "Could not access directory");

    if (lstat(path, &dstat) < 0) 
        err(EXIT_FAILURE, "Unable to lstat path");

    if (!S_ISDIR(dstat.st_mode))
        errx(EXIT_FAILURE, "The path %s is not a directory.\n", path);

    /* Allocate a buffer of equal size to the directory to store dents */
    if ((buffer = calloc(dstat.st_size*3, 1)) == NULL)
        err(EXIT_FAILURE, "Buffer allocation failure");

    /* Open the directory */
    if ((dirfd = open(path, O_RDONLY)) < 0) 
        err(EXIT_FAILURE, "Open error");

    /* Switch directories */
    fchdir(dirfd);

    if (delete) {
        printf("Deleting files in ");
        for (int i=5; i > 0; i--) {
            printf("%u. . . ", i);
            fflush(stdout);
            sleep(1);
        }
        printf("\n");
    }

    while (bufcount = syscall(SYS_getdents, dirfd, buffer, dstat.st_size*3)) {
        offset = 0;
        dent = buffer;
        while (offset < bufcount) {
            /* Don't print thisdir and parent dir */
            if (!((strcmp(".",dent->d_name) == 0) || (strcmp("..",dent->d_name) == 0))) {
                d_type = (char *)dent + dent->d_reclen-1;
                /* Only print files */
                if (*d_type == DT_REG) {
                    printf ("%s\n", dent->d_name);
                    if (delete) {
                        if (unlink(dent->d_name) < 0)
                            warn("Cannot delete file \"%s\"", dent->d_name);
                    }
                    totalfiles++;
                }
            }
            offset += dent->d_reclen;
            dent = buffer + offset;
        }
    }
    fprintf(stderr, "Total files: %d\n", totalfiles);
    close(dirfd);
    free(buffer);

    exit(0);
}

Whilst this does not combat the underlying fundamental problem (lots of files, in a filesystem that performs poorly at it). It's likely to be much, much faster than many of the alternatives being posted.

As a forethought, one should remove the affected directory and remake it after. Directories only ever increase in size and can remain poorly performing even with a few files inside due to the size of the directory.

Edit: I've cleaned this up quite a bit. Added an option to allow you to delete on the command line at runtime and removed a bunch of the treewalk stuff which, honestly looking back was questionable at best. Also was shown to produce memory corruption.

You can now do dentls --delete /my/path

New results. Based off of a directory with 1.82 million files.

## Ideal ls Uncached
$ time ls -u1 data >/dev/null

real    0m44.948s
user    0m1.737s
sys 0m22.000s

## Ideal ls Cached
$ time ls -u1 data >/dev/null

real    0m46.012s
user    0m1.746s
sys 0m21.805s


### dentls uncached
$ time ./dentls data >/dev/null
Total files: 1819292

real    0m1.608s
user    0m0.059s
sys 0m0.791s

## dentls cached
$ time ./dentls data >/dev/null
Total files: 1819292

real    0m0.771s
user    0m0.057s
sys 0m0.711s

Was kind of surprised this still works so well!

Solution 2:

The data=writeback mount option deserves to be tried, in order to prevent journaling of the file system. This should be done only during the deletion time, there is a risk however if the server is being shutdown or rebooted during the delete operation.

According to this page,

Some applications show very significant speed improvement when it is used. For example, speed improvements can be seen (...) when applications create and delete large volumes of small files.

The option is set either in fstab or during the mount operation, replacing data=ordered with data=writeback. The file system containing the files to be deleted has to be remounted.


Solution 3:

Would it be possible to backup all of the other files from this file system to a temporary storage location, reformat the partition, and then restore the files?


Solution 4:

There is no per directory file limit in ext3 just the filesystem inode limit (i think there is a limit on the number of subdirectories though).

You may still have problems after removing the files.

When a directory has millions of files, the directory entry itself becomes very large. The directory entry has to be scanned for every remove operation, and that takes various amounts of time for each file, depending on where its entry is located. Unfortunately even after all the files have been removed the directory entry retains its size. So further operations that require scanning the directory entry will still take a long time even if the directory is now empty. The only way to solve that problem is to rename the directory, create a new one with the old name, and transfer any remaining files to the new one. Then delete the renamed one.


Solution 5:

I haven't benchmarked it, but this guy did:

rsync -a --delete ./emptyDirectoty/ ./hugeDirectory/