How to find the size of a HDFS file

I also find myself using hadoop fs -dus <path> a great deal. For example, if a directory on HDFS named "/user/frylock/input" contains 100 files and you need the total size for all of those files you could run:

hadoop fs -dus /user/frylock/input

and you would get back the total size (in bytes) of all of the files in the "/user/frylock/input" directory.

Also, keep in mind that HDFS stores data redundantly so the actual physical storage used up by a file might be 3x or more than what is reported by hadoop fs -ls and hadoop fs -dus.


I used the below function which helped me to get the file size.

public class GetflStatus
{
    public long getflSize(String args) throws IOException, FileNotFoundException
    {
        Configuration config = new Configuration();
        Path path = new Path(args);
        FileSystem hdfs = path.getFileSystem(config);
        ContentSummary cSummary = hdfs.getContentSummary(path);
        long length = cSummary.getLength();
        return length;
    }
}

You can use hadoop fs -ls command to list files in the current directory as well as their details. The 5th column in the command output contains file size in bytes.

For e.g. command hadoop fs -ls input gives following output:

Found 1 items
-rw-r--r--   1 hduser supergroup      45956 2012-07-19 20:57 /user/hduser/input/sou

The size of file sou is 45956 bytes.

Tags:

Hadoop

Hdfs