The number of tasks that can be run simultaneously on a tasktracker is related to the
number of processors available on the machine. Because MapReduce jobs are normally
I/O-bound, it makes sense to have more tasks than processors to get better
utilization. The amount of oversubscription depends on the CPU utilization of jobs
you run, but a good rule of thumb is to have a factor of between one and two more
tasks (counting both map and reduce tasks) than processors.
For example, if you had 8 processors and you wanted to run 2 processes on each pro-cessor, then you could set each of mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum to 7 (not 8, since the datanode and the
tasktracker each take one slot). If you also increased the memory available to each child
task to 400 MB, then the total memory usage would be 7,600 MB
--《Hadoop: The Definitive Guide》
mapred.tasktracker.map.tasks.maximum 一般跟物理核数有关
然后,每个tasktracker还有一些其他服务线程(Hadoop自带的),需要为这些进程预留1~2个核比较好
这14个task,可按照你自己需求分,如:8个map slot,6个reduce slot
实际上,只考虑核是不行的,还需要考虑内存,磁盘等