Setting the number of map tasks and reduce tasks
It's important to keep in mind that the MapReduce framework in Hadoop allows us only to
suggest the number of Map tasks for a job
which like Praveen pointed out above will correspond to the number of input splits for the task. Unlike it's behavior for the number of reducers (which is directly related to the number of files output by the MapReduce job) where we can
demand that it provide n reducers.
As Praveen mentions above, when using the basic FileInputFormat
classes is just the number of input splits that constitute the data. The number of reducers is controlled by mapred.reduce.tasks
specified in the way you have it: -D mapred.reduce.tasks=10
would specify 10 reducers. Note that the space after -D
is required; if you omit the space, the configuration property is passed along to the relevant JVM, not to Hadoop.
Are you specifying 0
because there is no reduce work to do? In that case, if you're having trouble with the run-time parameter, you can also set the value directly in code. Given a JobConf
instance job
, call
job.setNumReduceTasks(0);
inside, say, your implementation of Tool.run
. That should produce output directly from the mappers. If your job actually produces no output whatsoever (because you're using the framework just for side-effects like network calls or image processing, or if the results are entirely accounted for in Counter values), you can disable output by also calling
job.setOutputFormat(NullOutputFormat.class);
The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapreduce job the number of map tasks is equal to the number of input splits. mapred.map.tasks is just a hint to the InputFormat for the number of maps.
In your example Hadoop has determined there are 24 input splits and will spawn 24 map tasks in total. But, you can control how many map tasks can be executed in parallel by each of the task tracker.
Also, removing a space after -D might solve the problem for reduce.
For more information on the number of map and reduce tasks, please look at the below url
https://cwiki.apache.org/confluence/display/HADOOP2/HowManyMapsAndReduces