When you run the program multiple threads MR hadoop emerging issues

 

Night multiple tasks in parallel, there are always a few random mission failed, view the log:

  cat -n ads_channel.log |grep "Caused by"
  7732    Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot overwrite non empty destination directory /tmp/hadoop-hdfs/mapred/local/1576781334421
  7737    Caused by: java.io.IOException: Rename cannot overwrite non empty destination directory /tmp/hadoop-hdfs/mapred/local/1576781334421

 

 Extended:

CAT -n ads_channel.log | grep  " Caused by " or grep ads_channel.log -e " Caused by " or grep -E " Caused by | FAILED "   ads_channel.log two keyword #
 grep  " 2019-12-21 " ads_channel .log | grep  " Caused by " ads_channel.log
 CAT ads_channel.log | grep  " Caused by " -B 10 ## 20 rows according to keywords view the log
 CAT ads_channel.log | grep  "Caused by" -A 10 ## 20 See keywords based on the lines of the log
 CAT ads_channel.log | grep  " Caused by " -C 10 # contextually keyword to see 10 lines of the log


Description:
- then A represents a keyword, the After
 - B represents a keyword before, the Before
 - C represents a front keyword, Context

vim ads_channel.log
: the SET NU: 7749 (jump to a specified number of rows)

Real-time query multiple keywords log information
命令:tail -f ads_channel.log |grep -E “Caused by"

problem causes:

    When problems arise when multiple threads to run hadoop MR program:
        https://issues.apache.org/jira/browse/MAPREDUCE-6992
        https://issues.apache.org/jira/browse/MAPREDUCE-6441

hdfs creates a timestamp of the current time to the named file. When the two mr task submitted in the same millisecond, resulting in a concurrent access to the file in question.

yarn operating modes:

1- local mode (LocalJobRunner implemented)
mapreduce.framework.name to local, YARN cluster is not used to allocate resources, performed in the local node. In the task of the local mode of operation, can not play the advantages of the cluster. Note: In the web UI task is to see less than the local mode of operation.

        Some understanding of the hive will know, hive will eventually be converted to SQL statements mapreduce Task Scheduler distributed execution. Time for the big number of sets of data it takes to start mapreduce is insignificant. Because the amount of data, and then distributed on different machines, processed on different machines, this is one of hive advantage. However, when dealing with a small number, and the data are gathered again on a machine, then start the local model is very intentional, inevitable start mapreduce, the data back to the client, local processing, which reduces the cost of the merger process points time. Thus, it can be performed locally on a small amount of data operations, such tasks than submit to the efficiency of the cluster much faster.
Start the local mode, you configure the following parameters: 

     hive.exec.mode.local.auto Hive decide whether it should automatically run locally depending on the input file size.    
     The maximum amount of input data hive.exec.mode.local.auto.inputbytes.max, when the input data is less than this value, will start local mode, the default is 128M.    
     The maximum number of input files hive.exec.mode.local.auto.tasks.max, when the number of the input file is smaller than this value will start when the local mode. (Default 4)

 

When a job following conditions are met can really use local mode:      

  An input data size must be smaller than .job: hive.exec.mode.local.auto.inputbytes.max (default 128MB)    
   2 Number of Map .job must be smaller than: hive.exec.mode.local.auto.tasks. max (default. 4)    
   . 3 .job must reduce the number 0 or 1

 

2-Yarn mode (YARNRunner implemented)
        mapreduce.framework.name Yarn set, when the client is configured mapreduce.framework.name Yarn, client and server communicate using YARNRunner, and YARNRunner achieved by the real and the RM ClientRMProtocol interaction, including the submission of Application, inquiry status and other functions. But according to the characteristics of the task, divided into two ways to perform tasks

3-Uber mode:

        A model for smaller jobs to reduce delays and design, all tasks, regardless Map Task, or Reduce Task, are executed in the same order in a Container, this is actually MRAppMaster located Container Container

4-Non-Uber mode:

         For long-running large operations, to apply for resources to Map Task, when the Map Task run up to a certain proportion of the number of completed and then apply resources to Reduce Task.

Solution:

    1- without changing the source code, the local mode is canceled automatically starts, according to a clustered environment, when running a program temporarily installed:

set hive.exec.mode.local.auto = false
2- failure retry settings in the scheduling system.
azkaban failure retry configuration as follows:
type =command
command = xxxxxx
retries = . 3 
retry.backoff = 60000 # milliseconds

Reference: https://blog.csdn.net/weixin_39445556/article/details/80348976

In the official website to find this bug, in version 2.7.1 has fixed this bug, the cluster upgrade:

This is a bug in Hadoop 2.6.0. It's been marked as fixed but it still happens occasionally (see: https://issues.apache.org/jira/browse/YARN-2624).

https://stackoverflow.com/questions/30857413/hadoop-complains-about-attempting-to-overwrite-nonempty-destination-directory

 

[HDFS @ EL-hadoop- 1 logs] $ hadoop dfsadmin - Report ## View hadoop status:
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1242537227061 (1.13 TB)
Present Capacity: 1154802876345 (1.05 TB)
DFS Remaining: 1125514018745 (1.02 TB)
DFS Used: 29288857600 (27.28 GB)
DFS Used%: 2.54%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (3):

Name: 172.26.0.106:50010 (el-hadoop-1)
Hostname: el-hadoop-1
Rack: /default
Decommission Status : Normal
Configured Capacity: 414179075687 (385.73 GB)
DFS Used: 9740627968 (9.07 GB)
Non DFS Used: 22051710567 (20.54 GB)
DFS Remaining: 360492523769 (335.73 GB)
DFS Used%: 2.35%
DFS Remaining%: 87.04%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 8
Last contact: Sat Dec 21 11:29:07 CST 2019


Name: 172.26.0.108:50010 (el-hadoop-2)
Hostname: el-hadoop-2
Rack: /default
Decommission Status : Normal
Configured Capacity: 414179075687 (385.73 GB)
DFS Used: 9774043136 (9.10 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 382510819168 (356.24 GB)
DFS Used%: 2.36%
DFS Remaining%: 92.35%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 8
Last contact: Sat Dec 21 11:29:06 CST 2019


Name: 172.26.0.109:50010 (el-hadoop-3)
Hostname: el-hadoop-3
Rack: /default
Decommission Status : Normal
Configured Capacity: 414179075687 (385.73 GB)
DFS Used: 9774186496 (9.10 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 382510675808 (356.24 GB)
DFS Used%: 2.36%
DFS Remaining%: 92.35%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 8
Last contact: Sat Dec 21 11:29:08 CST 2019

Guess you like

Origin www.cnblogs.com/shengyang17/p/12076353.html