hadoop configuration, run error summary

Novices are most troubled by all kinds of problems in hadoop. I will roughly sort out the problems and solutions I have encountered first, and hope to be helpful to you.

1. If the hadoop cluster restarts the cluster after the namenode is formatted (bin/hadoop namenode -format), the following

Incompatible namespaceIDS in … :namenode namespaceID = … ,datanode namespaceID=…

error will appear, because a new namenode will be recreated after formatting the namenode namespaceID, so that it is inconsistent with the original on the datanode.

Solution:

delete the data files in the datanode dfs.data.dir directory (the default is tmp/dfs/data),
modify dfs.data.dir/current/VERSION file, and fix the namespaceID to be the same as that on the namenode (in the log error will be There is a prompt)
re-specify the new dfs.data.dir directory
2. When the hadoop cluster starts start-all.sh, the slave always fails to start the datanode, and will report an error:

… could only be replicated to 0 nodes, instead of 1 …

That is, the identity of the node may be repeated (personally thinks the reason for this error). There may also be other reasons, please try the solutions one by one, I solved it.
Solution:

Delete all data files in the dfs.data.dir and dfs.tmp.dir directories of all nodes (default is tmp/dfs/data and tmp/dfs/tmp); then re-format the node with hadoop namenode -format; then start.
If it is a port access problem, you should make sure that all ports used are open, such as hdfs://machine1:9000/, 50030, 50070, etc. Execute the #iptables -I INPUT -p tcp –dport 9000 -j ACCEPT command. If there is still an error: hdfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused; it should be that the port on the datanode cannot be accessed, modify the iptables on the datanode: #iptables -I INPUT -s machine1 -p tcp -j ACCEPT
It is also possible that the firewall restricts the communication between the clusters. Try turning off the firewall. /etc/init.d/iptables stop
There may be insufficient disk space at the end, please check df -al
. When I was solving this problem, someone said: starting namenode and datanode successively can solve this problem (I tried to find it was useless, You can try) $hadoop-daemon.sh start namenode; $hadoop-daemon.sh start datanode
3. Error: java.lang.NullPointerException
null pointer exception occurs in program execution, to ensure the correctness of the java program. Variables should be instantiated and declared before they are used, and there should be no such phenomenon as array out-of-bounds. Check the program.

4. When executing your own program, (various) errors are reported, please ensure the following:

the premise is that your program is compiled correctly. In
cluster mode, please write the data to be processed into HDFS, and ensure the HDFS path Correctly
specify the entry class name of the jar package to be executed (I don't know why it can be run without specifying it sometimes). The
correct writing is similar to:

$ hadoop jar myCount.jar myCount input output

5. The problem that ssh cannot communicate normally, this problem I It is mentioned in detail in the construction section.

6. Program compilation problems, if various packages are not available, please make sure that you have imported the jar packages in the hadoop directory and hadoop/lib directory. For details, see the operation in the construction chapter.

7. Unrecognized option appears when Hadoop starts datanode: -jvm and Could not create the Java virtual machine.
There is the following shell in the hadoop installation directory /bin/hadoop:

View Code SHELL

1 2 3 4 5 6
CLASS='org.apache .hadoop.hdfs.server.datanode.DataNode' if [[ $EUID -eq 0 ]]; then HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS" else HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS" fi
$EUID The user ID here, if it is root, this ID will be 0, so try not to use the root user to operate hadoop. This is why I mentioned in the configuration chapter not to use the root user.

8. If the error message in the terminal is:

ERROR hdfs.DFSClient: Exception closing file /user/hadoop/musicdata.txt : java.io.IOException: All datanodes 10.210.70.82:50010 are bad. Aborting...

and jobtracker log Error message

Error register getProtocolVersion

java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersion

and possibly some warning messages:

WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Broken pipe

WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_3136320110992216802_1063java. io.IOException: Connection reset by peer

WARN hdfs.DFSClient: Error Recovery for block blk_3136320110992216802_1063 bad datanode[0] 10.210.70.82:50010 put: All datanodes 10.210.70.82:50010 are bad. Aborting...

Solution:

Check whether the path pointed to by the dfs.data.dir attribute is a disk It is full. If it is full, try to haveoop fs -put data again after processing.
If the relevant disk is not full, it is necessary to check that the relevant disk has no bad sectors and needs to be detected.
9. If you get an error message when executing the hadoop jar program:

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.LongWritable

or similar :

Status : FAILED java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

Then you need to learn the basics of hadoop data types and map/reduce models. The middle part of my reading notes introduces the data types defined by hadoop and the methods of custom data types (mainly the study and understanding of writable classes); and the types and formats of MapReduce mentioned in this article. That is, Chapter 4 Hadoop I/O and Chapter 7 Types and Formats of MapReduce in the book "The Definitive Guide to Hadoop". If you are in a hurry to solve this problem, I can also tell you a quick solution now, but this will inevitably affect your future development: Make

sure the data is consistent:

… extends Mapper…

public void map(k1 k, v1 v, OutputCollector output) …



…extends Reducer…

public void reduce(k2 k,v2 v,OutputCollector output)…



job.setMapOutputKeyClass(k2.class);

job.setMapOutputValueClass(k2.class);

job.setOutputKeyClass(k3.class);

job. setOutputValueClass(v3.class);

...

note the correspondence between k* and v*. It is recommended to read the two chapters I just mentioned. Know the principle in detail.

10. If you encounter the datanode error as follows:

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /data1/hadoop_data. The directory is already locked.

According to the error message, the directory is locked and cannot be read . At this time, you need to check whether any related processes are still running or the related hadoop processes of the slave machine are still running. Combine these two linux commands to check:

netstat -nap

ps -aux | grep related PIDs

If there are hadoop related processes still running When running, just use the kill command to kill it. Then use start-all.sh again.

11. If you encounter the jobtracker error as follows:

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

The solution is to modify the /etc/hosts file in the datanode node.

Briefly introduce the hosts format:

each line is divided into three parts: the first part of the network IP address, the second part of the host name or domain name, the third part of the host alias The detailed steps of the

operation are as follows:

1. First check the host name:

cat /proc/ sys/kernel/hostname

will see an attribute of HOSTNAME, change the value behind it to IP and it will be OK, and then exit.

2. Use the command:

hostname ***.***.***.***

Replace the asterisk with the corresponding IP.

3. Modify the hosts configuration similar to the following:

127.0.0.1 localhost.localdomain localhost

::1 localhost6.localdomain6 localhost6

10.200.187.77 10.200.187.77 hadoop-datanode

If the IP address appears after the configuration, it means the modification is successful, and if the host name is still displayed, If there is a problem, continue to modify the hosts file, as shown in the

following figure:



The above figure reminds me that chenyi is the host name.



When in the test environment, you need to deploy a domain name server by yourself (personally feel very cumbersome), so it is more convenient to use the IP address directly. If you have a domain name server, you can directly configure the mapping.

If the problem of shuffling error still occurs, then try what other netizens said to modify the hdfs-site.xml file in the configuration file and add the following content:

dfs.http.address

*.*.*.*:50070 Port is not required Change, replace the asterisk with IP, because hadoop information is transmitted through HTTP, this port is unchanged.

11. If you encounter a jobtracker error as follows:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code *

This is the error code returned by the system thrown by java. Please refer to the detailed meaning of the error code here.

I encountered some streaming php programs here, and the error code encountered is code 2: No such file or directory. That is, the file or directory cannot be found. It is very pitiful to find that the command forgot to use 'php ****'. In addition, it may also be caused by commands such as include and require. For details, please modify according to your own situation and error code.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326396118&siteId=291194637