hadoop common errors and finishing solutions

1:Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out 

Answer:
Programs which need to open multiple files, for analysis, the system is generally the default number is 1024 (you can see with ulimit -a) is enough for normal use, but for the program is concerned, it is too little.
Modify way:
to modify two files.
       /etc/security/limits.conf
vi /etc/security/limits.conf
plus:
* Soft nofile 102400
* Hard nofile 409600

$ /etc/pam.d/ cd
$ sudo vi the Login
       add session required / lib / security / pam_limits .so

first question I correct answer next:
this is to obtain a completed map of the preprocessing stage shuffle when reduce the number of failures exceeds the upper limit output caused by the default upper limit is 5. This problem may be caused by the way there are many, such as a network connection is not normal, connection timeout, the bandwidth and port blocking is poor and so on. . . Usually within the framework of the network is better is that this error does not occur.

2: Too MANY failures FETCH- 
Answer:
This problem occurs mainly in communication between nodes is not comprehensive enough.
1) Check, / etc / hosts
claim server name corresponding to the local ip
claim to include all servers name server ip +
2) Check .ssh / authorized_keys
Request includes all servers (including itself) Key public

. 3: slow processing speed is particularly fast but the map appears very slow and reduce = 0% recurring reduce 
Answer The:
binding a second point, and then
modify conf / hadoop-env.sh the HADOOP_HEAPSIZE = 4000 Export 

4: able to start datanode, but can not access, can not be the end of an error 
when reformatting a new distributed file needs to be configured on NameNode your dfs.name.dir with this namenode NameNode persistent storage to store the name space and the transaction log to delete the local file system path, while the path dfs.data.dir on each DataNode DataNode directory on the local file system path of the block data storage is also deleted. As this configuration is to remove / home / hadoop / NameData on NameNode, delete / home / hadoop / DataNode1 and / home / hadoop / DataNode2 in DataNode. This is because when formatting a new Hadoop distributed file system, each storage namespace that corresponds to the version of setup time (you can see / home / hadoop VERSION file / NameData under / current directory, recorded thereon version information), when reformatting new distributed file system, best to remove NameData directory. You must remove dfs.data.dir DataNode of each. So that it can make namedode datanode recorded version of the information and correspondence.
Note: Delete is a very dangerous act, you can not remove the case can not be confirmed! ! Do all deleted files and other backup! !

5: java.io.IOException: Could not obtain block 
: blk_194219614024901469_1100 file = / user / hive / warehouse / src_20090724_log / src_20090724_log This occurs mostly node broken, not connected.

6: java.lang.OutOfMemoryError: Java heap space 
appears this anomaly is obviously not enough memory jvm have reason to modify the jvm memory size of all datanode.
Java -Xms1024m -Xmx4096m
maximum memory usage general jvm be half the size of total memory, 8G memory we use, it is set to 4096m, this value may still not be optimal values.

This topic Pinned by admin on 2009-11-20 10:50
Top, such postings very good, to be on top. Attached is information provided by Hadoop cluster Ruo-Bing technology exchange students:
(12.58 KB) 
method to add a node Hadoop 
their actual add node process:
1. First configure the environment on the slave, including ssh, jdk, related config, copy lib, bin or the like;
2. datanode new cluster is added to the host and other namenode datanode go;
3. ip added to the new master of the datanode conf / slaves; and
4. restart the cluster, the cluster in the datanode see the new node;
5. run bin / start-balancer.sh, this can be time consuming
Notes:
1. If you do not balance, then the cluster will new data are stored on the new node, this will reduce mr efficiency;
2 may also be called bin / start-balancer.sh command execution, parameters can also be added. 5 -threshold
threshold is a threshold balance, the default is 10%, the lower the value, the more balanced the nodes, but also more time-consuming .
3. balancer can also run on there mr job of the cluster, the default dfs.balance.bandwidthPerSec low, at 1M / s. In the absence of mr job, this setting can improve load balancing speed up time.

Additional notes:
1. Make sure the slave's firewall is turned off;
2. 确保新的slave的ip已经添加到master及其他slaves的/etc/hosts中,反之也要将master及其他slave的ip添加到新的slave的/etc/hosts中
mapper及reducer个数 
url地址: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
HowManyMapsAndReduces
Partitioning your job into maps and reduces
Picking the appropriate size for the tasks for your job can radically change the performance of Hadoop. Increasing the number of tasks increases the framework overhead, but increases load balancing and lowers the cost of failures. At one extreme is the 1 map/1 reduce case where nothing is distributed. The other extreme is to have 1,000,000 maps/ 1,000,000 reduces where the framework runs out of resources for the overhead.
Number of Maps
The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps. The right level of parallelism for maps seems to be around 10-100 maps/node, although we have taken it up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.
Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the [WWW] InputFormat determines the number of maps.
The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.
Number of Reduces
The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.
Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files (io.buffer.size * 2 * numReduces << heapSize). This will be fixed at some point, but until it is it provides a pretty firm upper bound.
The number of reduces also controls the number of output files in the output directory, but usually that is not important because the next map/reduce step will split them into even smaller splits for the maps.
The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf's conf.setNumReduceTasks(int num). 
自己的理解:
mapper个数的设置:跟input file 有关系,也跟filesplits有关系,filesplits的上线为dfs.block.size,下线可以通过mapred.min.split.size设置,最后还是由InputFormat决定。

较好的建议:
The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures. 
<property>
   <name>mapred.tasktracker.reduce.tasks.maximum</name>
   <value>2</value>
   <description>The maximum number of reduce tasks that will be run
   simultaneously by a task tracker.
   </description>
</property>

单个node新加硬盘 
1.修改需要新加硬盘的node的dfs.data.dir,用逗号分隔新、旧文件目录
2.重启dfs

同步hadoop 代码 
hadoop-env.sh
# host:path where hadoop code should be rsync'd from.   Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

用命令合并HDFS小文件 
hadoop fs -getmerge <src> <dest>

重启reduce job方法 
Introduced recovery of jobs when JobTracker restarts. This facility is off by default.
Introduced config parameters "mapred.jobtracker.restart.recover", "mapred.jobtracker.job.history.block.size", and "mapred.jobtracker.job.history.buffer.size".
还未验证过。

IO写操作出现问题 
0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/
172.16.100.165:50010 remote=/172.16.100.165:50930]
       at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
       at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
       at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
       at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)
       at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)
       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)
       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)
       java.lang.Thread.run AT (Thread.java:619)

It Seems there are MANY Reasons that IT CAN timeout, at The Example GIVEN in
HADOOP-3831 IS A SLOW Reading Client.

Solution: Set in the hadoop-site.xml try dfs.datanode.socket.write.timeout = 0;
My Understanding that the this Issue Should BE IS Fixed in that the Hadoop 0.19.1 SO
WE Should Leave The timeout HOWEVER an until the then the this Standard CAN Help.
Resolve The DELINQUENCY One like you ' . Re Seeing

HDFS outage node method of 
the current version of dfsadmin help information is not clearly written, the file has a bug, the correct method is as follows:
1. dfs.hosts to set current slaves, with the full file name path, note that the node host name in the list to use the name that uname -n can get that.
2. The full name in the list of slaves to be out of service node in another file, such as slaves.ex, use dfs.host.exclude parameter points to the full path of the file
3. Run the command bin / hadoop dfsadmin -refreshNodes
4. web interface or bin / hadoop dfsadmin -report can see the state of the node is out of service Decomission in progress, until the need to copy data has been copied
5. After, in the slaves (refer to document dfs.hosts points) removed the node has been out of service

Incidentally another three uses -refreshNodes command:
2. Add to the list of nodes allowed (add a host name to be in dfs.hosts)
3. direct nodes removed, do not do a backup copy of the data (in dfs.hosts in removing the host name)
4. outage reverse operation - stop exclude dfs.hosts inside and inside there, and outage ongoing decomission node, that is, the progress of the re-Decomission in node becomes Normal (in the web interface Service called in)

hadoop learn 
1. hadoop OutOfMemoryError solve the problem:
<Property>
<name> mapred.child.java.opts </ name>
<value> -Xmx800M -server </ value>
</ Property>
at The right size in the JVM With your hadoop-the site.xml, you have have to Copy the this by Will
to All mapred at The Cluster Nodes and restart.
Or: Hadoop JAR JarFile [main class] -D mapred.child.java.opts options -Xmx800M = 

2. the Hadoop java.io.IOException: AT org.apache.hadoop.mapred.JobClient.runJob failed the Job (JobClient.java!: 1232) Indexing the while.
the when i use nutch1.0, GET the this error:
Hadoop java.io.IOException:! the Job failed the AT org.apache.hadoop.mapred.JobClient.runJob (JobClient.java:1232) Indexing the while.
this is also solved:
you can delete conf / log4j.properties, then you can see detailed error report
I appear here is out of memory
solutions is to run the main class org.apache.nutch.crawl.Crawl add parameters: - Xms64m -Xmx512m
your probably not the problem, but to see the detailed error reporting issues like solved

distribute cache using 
similar a global variable, but because this variable is large, it can not be set in the config file, switch to distribute cache
specific use :( see "the definitive guide", P240)
1. In the command line call: Call -files, the need to introduce a query file (can be a local file, HDFS file (using hdfs:? // xxx)), or -archives (JAR, ZIP, tar, etc.)
% hadoop JAR MaxTemperatureByStationNameUsingDistributedCacheFile job.jar \
   -files INPUT / NCDC / Metadata / Fixed Stations-INPUT-width.txt / NCDC / Output All
calling program 2:
public void Configure (the JobConf the conf) {
   Metadata new new NcdcStationMetadata = ();
   the try {
       Metadata .initialize (new new File ( "Fixed-width.txt-Stations"));
   } the catch (IOException E) {
       the throw a RuntimeException new new (E);
   }
}
another indirect method using: if in the hadoop-0.19.0 no
call addCacheFile () or addCacheArchive () to add files,
use getLocalCacheFiles () or getLocalCacheArchives () access to documents

hadoop the job display Web 
There are Web-based the interfaces to both The the JobTracker (the MapReduce Master) and the NameNode (the HDFS Master) Which the display Status Pages About The State of The Entire System By default, THESE are located block AT [the WWW] HTTP:. / and /job.tracker.addr:50030/ [the WWW] HTTP: //name.node.addr:. 50070 / 

Hadoop monitoring 
OnlyXP (52388483) 131702
used as nagios alarm, ganglia graph can be monitored

status of 255 error 
error type :
java.io.IOException: Task Process Exit with nonzero Status of 255. The
       AT org.apache.hadoop.mapred.TaskRunner.run (TaskRunner.java:424)

Cause:
. Mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to the SET IN AREAS OF COMMUNICAITIONS value By default, Their values are 24-hours THESE BE Might at The reason for failure, though the I'm not the Sure.

Split size 
FileInputFormat the INPUT the splits: ( see "at The Definitive Guide" P190)
mapred.min.split.size: default = 1, at The Smallest Valide A File size in bytes for Split.
mapred.max.split.size: default = Long.MAX_VALUE, at The Largest! Valid size .
dfs.block.size: 64M = default, the system is set to 128M.
If you set the minimum split size> block size, will increase the number of blocks. (Guess take data from other nodes when the merger would block, resulting in an increase in the number of block) 
If you set the maximum split size <block size, will further split block.

split size = max (minimumSize, min (maximumSize, blockSize));
wherein minimumSize <blockSize <maximumSize.

sort by value 
hadoop not provide direct sort by value method, as this will reduce mapreduce performance.
However, a combination of approaches may be used to achieve the specific implementation method, see "the definitive guide", P250
basic idea:
1. Combination key / value as a new Key;
2. Overload partitioner, according to the old key segmentation;
conf.setPartitionerClass ( FirstPartitioner.class);
3. custom keyComparator: The old key to sort, according to the old value and then sorting;
conf.setOutputKeyComparatorClass (KeyComparator.class);
4. overload GroupComparator, can be combined in accordance with old key; conf.setOutputValueGroupingComparator (GroupComparator .class);

treatment of small input files 
for a series of small files as input file, hadoop reduces efficiency.
There are three ways to small file merge processing:
1. The series of small files combined into a sequneceFile, mapreduce accelerated speed.
See WholeFileInputFormat and SmallFilesToSequenceFileConverter, "The Definitive Guide", P194
2. Use CombineFileInputFormat integrated FileInputFormat, but had not achieved;
3. hadoop archives (similar packing), small file metadata to reduce memory consumption of namenode. (This method is not always feasible, it is not recommended)
method:
the / my / files directory and its subdirectories into the archive files.har, then placed in / my directory
bin / hadoop archive -archiveName files.har / my / files / my

Check in at The Archive Files:
bin / hadoop FS antiLSR HAR: //my/files.har

Skip Bad Records 
JobConf conf = new new JobConf (ProductMR.class);
conf.setJobName ( "ProductMR");
conf.setOutputKeyClass ( Text.class);
conf.setOutputValueClass (Product.class);
conf.setMapperClass (Map.class);
conf.setReducerClass (Reduce.class);
conf.setMapOutputCompressorClass(DefaultCodec.class);
conf.setInputFormat(SequenceFileInputFormat.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
String objpath = "abc1";
SequenceFileInputFormat.addInputPath(conf, new Path(objpath));
SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE);
SkipBadRecords.setAttemptsToStartSkipping(conf, 0);
SkipBadRecords.setSkipOutputPath(conf, new Path("data/product/skip/"));
String output = "abc";
SequenceFileOutputFormat.setOutputPath(conf, new Path(output));
JobClient.runJob(conf);

For skipping failed tasks try : mapred.max.map.failures.percent

restart 单个datanode 
If a problem occurs datanode, after addressing the needs rejoin the cluster without restarting the cluster, as follows:
bin / hadoop-daemon.sh Start datanode
bin / hadoop-daemon.sh Start the JobTracker

the reduce Exceed 100% 
"the Reduce Task Progress Shows> 100% the size of Map Total When Outputs (for a
SINGLE the reducer) iS High "
cause:
in reduce the merge process, check progress error, resulting in status> 100%, the following error occurs in the statistical process: the java.lang .ArrayIndexOutOfBoundsException: 3
       AT org.apache.hadoop.mapred.StatusHttpServer $ TaskGraphServlet.getReduceAvarageProgresses (StatusHttpServer.java:228)
       AT org.apache.hadoop.mapred.StatusHttpServer $ TaskGraphServlet.doGet (StatusHttpServer.java:159)
       AT the javax.servlet .http.HttpServlet.service (HttpServlet.java:689)
       at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
       at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
       at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
       at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
       at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
       at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
       at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
       at org.mortbay.http.HttpServer.service(HttpServer.java:954)

jira地址:

counters 
3中counters:
1. built-in counters: Map input bytes, Map output records...
2. enum counters
调用方式:
   enum Temperature {
MISSING,
MALFORMED
   }

reporter.incrCounter(Temperature.MISSING, 1)
结果显示:
09/04/20 06:33:36 INFO mapred.JobClient: Air Temperature Recor
09/04/20 06:33:36 INFO mapred.JobClient:     Malformed=3
09/04/20 06:33:36 INFO mapred.JobClient:     Missing=66136856
3. dynamic countes:
调用方式:
reporter.incrCounter("TemperatureQuality", parser.getQuality(),1);

结果显示:
09/04/20 06:33:36 INFO mapred.JobClient: TemperatureQuality
09/04/20 06:33:36 INFO mapred.JobClient:     2=1246032
09/04/20 06:33:36 INFO mapred.JobClient:     1=973422173
09/04/20 06:33:36 INFO mapred.JobClient:     0=1
7: Namenode in safe mode 
解决方法
bin/hadoop dfsadmin -safemode leave

8:java.net.NoRouteToHostException: No route to host 
j解决方法:
sudo /etc/init.d/iptables stop

9:更改namenode后,在hive中运行select 依旧指向之前的namenode地址 
这是因为:When youcreate a table, hive actually stores the location of the table (e.g.
hdfs://ip:port/user/root/...) in the SDS and DBS tables in the metastore . So when I bring up a new cluster the master has a new IP, but hive's metastore is still pointing to the locations within the old
cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master
So namenode address that appears before the metastore To replace all existing namenode address


10:Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put). 
解决方法:
Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster.
If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s).
If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin's df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.
11:Your DataNodes won't start, and you see something like this in logs/*datanode*: 
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data
原因:
Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS.
解决方法:
You need to do something like this:
bin/stop-all.sh
rm -Rf /tmp/hadoop-your-username/*
bin/hadoop namenode -format
12:You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won't work. 
原因:
You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.
解决方法:
Use absolute paths like this from the tutorial:
bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar \
   -mapper   $HOME/proj/hadoop/multifetch.py       \
   -reducer $HOME/proj/hadoop/reducer.py          \
   -input urls/*                               \
   -output   titles
13: 2009-01-08 10:02:40,709 ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ""PARTITIONS"" in Catalog "" Schema "". JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "org.jpox.autoCreateTables" 
原因:就是因为在 hive-default.xml 里把 org.jpox.fixedDatastore 设置成 true 了
starting namenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop.out
localhost: starting datanode, logging to /home/hadoop/HadoopInstall/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop.out
localhost: starting secondarynamenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop.out
localhost: Exception in thread "main" java.lang.NullPointerException
localhost:    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130)
localhost:    at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
localhost:    at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:120)
localhost:    at org.apache.hadoop.dfs.SecondaryNameNode.initialize(SecondaryNameNode.java:124)
localhost:    at org.apache.hadoop.dfs.SecondaryNameNode.<init>(SecondaryNameNode.java:108)
localhost:    at org.apache.hadoop.dfs.SecondaryNameNode.main(SecondaryNameNode.java:460)
14:09/08/31 18:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010 
> 09/08/31 18:25:45 INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001
> 09/08/31 18:25:51 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.16:50010
> 09/08/31 18:25:51 INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001
> 09/08/31 18:25:57 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.11:50010
> 09/08/31 18:25:57 INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001
> 09/08/31 18:26:03 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.16:50010
> 09/08/31 18:26:03 INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001
> 09/08/31 18:26:09 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable
to create new block.
>       at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)
>       at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
>       at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)

> 09/08/31 18:26:09 WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001
bad datanode[2] nodes == null
> 09/08/31 18:26:09 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/umer/8GB_input"
- Aborting...
> put: Bad connect ack with firstBadLink 192.168.1.16:50010


解决方法:
I have resolved the issue:
What i did: 

1) '/etc/init.d/iptables stop' -->stopped firewall
2) SELINUX=disabled in '/etc/selinux/config' file.-->disabled selinux
I worked for me after these two changes
Jline.ConsoleReader.readLine not effective to solve the problem on Windows method 
in CliDriver.java the main () function, there is a statement reader.readLine, used to read standard input, but on the Windows platform, the statement always returns null, the reader is an example jline.ConsoleReader example, to Windows Eclipse debugging inconvenience.
We can replace it by using java.util.Scanner.Scanner, the original
while ((line = reader.readLine (curPrompt + ">")) = null!)
Copy the code
is replaced with:
Scanner sc = new new Scanner (System. in);
! the while ((Line = sc.nextLine ()) = null)
copy the code
recompilation release, SQL statements can be normally read input from standard input.

Windows eclispe debugging hive newspaper does not have a scheme error possible causes 
1, Hive profile "hive.metastore.local" configuration item is false, it needs to be modified is true, because it is a stand-alone version
2, the environment is not set HIVE_HOME variable, or set incorrectly
3, "does not have a scheme" probably because they can not "hive-default.xml". When using the Eclipse debugger Hive, encountered the solution can not be found hive- default.xml of: http: //bbs.hadoopor.com/thread-292-1-1.html
1, Chinese questions
from the url parsing Chinese, but hadoop print out still garbled? We used to think hadoop does not support Chinese, and later after viewing the source code and found hadoop just does not support gbk format output Chinese has been.
This is TextOutputFormat.class code, hadoop default output is inherited from FileOutputFormat come, FileOutputFormat two subclasses of a binary output stream is based on a text that is based on the output TextOutputFormat.
the TextOutputFormat class public <K, V> the extends FileOutputFormat <K, V> {
   protected static class LineRecordWriter <K, V>
the implements RecordWriter <K, V> {
Private static Final UTF8 String = "UTF-. 8"; // this is written dead became UTF8
Private Final static byte [] NEWLINE;
static {
   the try {
       NEWLINE = "\ n-" .getBytes (UTF8);
   } the catch (UnsupportedEncodingException UEE) {
       the throw new new an IllegalArgumentException ( "CAN Not Find" + + UTF8 "encoding");
   }
}

public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
   this.out = out;
   try {
       this.keyValueSeparator = keyValueSeparator.getBytes(utf8);
   } catch (UnsupportedEncodingException uee) {
       throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
   }
}

private void writeObject(Object o) throws IOException {
   if (o instanceof Text) {
       Text to = (Text) o;
       out.write(to.getBytes(), 0, to.getLength());//这里也需要修改
   } else {
       out.write(o.toString().getBytes(utf8));
   }
}
...
}
can be seen hadoop default output is written dead utf-8, so if decode Chinese is correct, then the Linux client's character set utf-8 can be seen in Chinese. Since Chinese hadoop output of a utf-8 format.
Because most database is used to define the field of gbk, gbk format if you want hadoop with Chinese output to be compatible with the database how to do?
We can define a new class:
public class GbkOutputFormat <K, V> the extends FileOutputFormat <K, V> {
   protected static class LineRecordWriter <K, V>
the implements RecordWriter <K, V> {
// gbk written to
private static final GBK = String "GBK";
Private Final static byte [] NEWLINE;
static {
   the try {
       NEWLINE = "\ n-" .getBytes (GBK);
   } the catch (UnsupportedEncodingException UEE) {
       the throw new new an IllegalArgumentException ( "CAN Not Find" + GBK + "encoding");
   }
}

public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
   this.out = out;
   try {
       this.keyValueSeparator = keyValueSeparator.getBytes(gbk);
   } catch (UnsupportedEncodingException uee) {
       throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
   }
}

private void writeObject(Object o) throws IOException {
   if (o instanceof Text) {
//        Text to = (Text) o;
//        out.write(to.getBytes(), 0, to.getLength());
//    } else {
       out.write(o.toString().getBytes(gbk));
   }
}

}
Then added in conf1.setOutputFormat mapreduce code (GbkOutputFormat.class)
i.e., the output format can gbk Chinese.

2, when a particular running mapreduce example, throw an error

java.io.IOException: Bad All DataNodes xxx.xxx.xxx.xxx:xxx are the Aborting ....
AT org.apache.hadoop.dfs.DFSClient $ DFSOutputStream.processDatanodeError ( DFSClient.java:2158)
AT org.apache.hadoop.dfs.DFSClient $ DFSOutputStream.access $ 1400 (DFSClient.java:1735)
AT org.apache.hadoop.dfs.DFSClient $ DFSOutputStream $ DataStreamer.run (DFSClient.java:1889 )
java.io.IOException:. Could not Block locations the Aborting ... GET
AT org.apache.hadoop.dfs.DFSClient $ DFSOutputStream.processDatanodeError (DFSClient.java:2143)
AT org.apache.hadoop.dfs.DFSClient $ DFSOutputStream.access $ 1400 (DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient $ DFSOutputStream $ DataStreamer.run (DFSClient.java:1889 )
has been identified as cause of the problem is linux machine to open too many files cause. Ulimit -n command can be found linux to open the default file number is 1024, modify /ect/security/limit.conf, increase hadoop soft 65535

re-run the program (the best of all datanode are modified), problem solving

3, running for some hadoop problem can not stop-all.sh after time, there is something wrong
no tasktracker to stop, no datanode to stop
the problem because hadoop stop in time based on the number mapred and dfs process on datanode. The default process ID stored in / tmp, linux default (usually a month or about seven days) to delete the files in this directory from time to time. Therefore deleted after hadoop-hadoop-jobtracker.pid and hadoop- hadoop-namenode.pid two documents, namenode naturally can not find these two processes on the datanode.
Export HADOOP_PID_DIR in the configuration file can solve this problem


Problems:
Incompatible namespaceIDs in / usr / local / Hadoop / DFS / Data: NameNode namespaceID = 405 233 244 966; Datanode namespaceID = 33,333,244
reasons:
at each hadoop namenode -format executed, generates namespaceID is NameNode ,, but hadoop.tmp. DataNode under the dir directory or to retain the last namespaceID, because inconsistent namespaceID, which led to DataNode not start, so just before each hadoop namenode -format execution, delete hadoop.tmp.dir directory can be started successfully. Please note that deleting hadoop.tmp.dir corresponding local directory, rather than HDFS directory.
Problem:  Storage Directory not exist 
2010-02-09 21: 37: 53,203 org.apache.hadoop.hdfs.server.common.Storage INFO: Storage Directory D: \ hadoop \ RUN \ dfs_name_dir does not exist.
2010-02-09 21: 37: 53,203 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: the Initialization failed The FSNamesystem.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory D: \ hadoop \ RUN \ dfs_name_dir IS AN in State Inconsistent: storage directory does not exist or iS not Accessible.
Solution: because the storage directory D: \ hadoop \ run \ dfs_name_dir does not exist, so only created this directory can be manually.
Problem:  the NameNode IS not formatted 
Solution: because HDFS is not formatted, just run hadoop namenode -format it, and then start to

After bin / hadoop jps reported the following abnormalities: 
Exception in the Thread "main" java.lang.NullPointerException
       AT sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms (LocalVmManager.java:127)
       AT sun.jvmstat.perfdata. monitor.protocol.local.MonitoredHostProvider.activeVms (MonitoredHostProvider.java:133)
       AT sun.tools.jps.Jps.main (Jps.java:45)
due to:
system root directory / tmp folder is deleted. Re-establish the / tmp folder.
unable appears bin / hive in to create log directory / tmp / ... may also be the reason

Guess you like

Origin www.cnblogs.com/bigdatasafe/p/10945164.html