How to count lines in a file on hdfs command?
Total number of files:
hadoop fs -ls /path/to/hdfs/* | wc -l
Total number of lines:
hadoop fs -cat /path/to/hdfs/* | wc -l
Total number of lines for a given file:
hadoop fs -cat /path/to/hdfs/filename | wc -l
1. Number of lines of a mapper output file:
`~]$ hadoop fs -cat /user/cloudera/output/part-m-00000 | wc -l`
2. Number of lines of a text or any other file on hdfs:
`~]$ hadoop fs -cat /user/cloudera/output/abc.txt | wc -l`
3. Top (Header) 5 lines of a text or any other file on hdfs:
`~]$ hadoop fs -cat /user/cloudera/output/abc.txt | head -5`
4. Bottom 10 lines of a text or any other file on hdfs:
`~]$ hadoop fs -cat /user/cloudera/output/abc.txt | tail -10`
You cannot do it with a hadoop fs
command. Either you have to write a mapreduce code with the logic explained in this post or this pig script would help.
A = LOAD 'file' using PigStorage() as(...);
B = group A all;
cnt = foreach B generate COUNT(A);
Makesure you have the correct extension for your snappy file so that pig could detect and read it.