Setting the number of map tasks and reduce tasks

It's important to keep in mind that the MapReduce framework in Hadoop allows us only to

suggest the number of Map tasks for a job

which like Praveen pointed out above will correspond to the number of input splits for the task. Unlike it's behavior for the number of reducers (which is directly related to the number of files output by the MapReduce job) where we can

demand that it provide n reducers.

As Praveen mentions above, when using the basic FileInputFormat classes is just the number of input splits that constitute the data. The number of reducers is controlled by mapred.reduce.tasks specified in the way you have it: -D mapred.reduce.tasks=10 would specify 10 reducers. Note that the space after -D is required; if you omit the space, the configuration property is passed along to the relevant JVM, not to Hadoop.

Are you specifying 0 because there is no reduce work to do? In that case, if you're having trouble with the run-time parameter, you can also set the value directly in code. Given a JobConf instance job, call

job.setNumReduceTasks(0);

inside, say, your implementation of Tool.run. That should produce output directly from the mappers. If your job actually produces no output whatsoever (because you're using the framework just for side-effects like network calls or image processing, or if the results are entirely accounted for in Counter values), you can disable output by also calling

job.setOutputFormat(NullOutputFormat.class);

The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapreduce job the number of map tasks is equal to the number of input splits. mapred.map.tasks is just a hint to the InputFormat for the number of maps.

In your example Hadoop has determined there are 24 input splits and will spawn 24 map tasks in total. But, you can control how many map tasks can be executed in parallel by each of the task tracker.

Also, removing a space after -D might solve the problem for reduce.

For more information on the number of map and reduce tasks, please look at the below url

https://cwiki.apache.org/confluence/display/HADOOP2/HowManyMapsAndReduces

Setting the number of map tasks and reduce tasks

Tags:

Hadoop

Mapreduce

Related

Recent Posts