通过dfsadmin查看DataNode的状态
通过查看启动日志分析启动失败原因
有时Hadoop集群无法正确启动,如 Master 上的 NameNode 进程没有顺利启动,这时可以查看启动日志来排查原因,不过新手可能需要注意几点:
?
启动时会提示 “Master: starting namenode, logging to
/usr/local/hadoop-2.6.0/logs/hadoop-hadoop-namenode-Master.out”,但
其实启动日志信息是记录在
/usr/local/hadoop-2.6.0/logs/hadoop-hadoop-namenode-Master.log 中;
?
每一次的启动日志都是追加在日志文件之后,所以得拉到最后面看,这个看下记录的时间就知道了。
一般出错的提示在最后面,也就是写着 Error 或者 Java 异常的地方。
?
也可以通过Web页面看到查看DataNode和NameNode的状态,http://master:50070/
报错总结及解决方案
启动失败报错1:2014-03-11 12:39:06,725 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
at org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddresses(DFSUtil.java:668) at
org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:151)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:673) at org.apache.hadoop.hdfs.server.datanode.DataNode.
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1727) at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1642) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1665) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1837) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1858) 2014-03-11 12:39:06,732 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2014-03-11 12:39:06,738 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at node1/192.168.26.160
启动失败解决1:检查配置文件时,发现node1,node2没有复制core-site.xml文件,复制后再次启动正常.
启动失败报错2:WARN fs.FileUtil: Failed to delete file or dir [/data/hadoop/dfs/name/current]: it still
exists.
15/01/21 05:21:17 WARN namenode.NameNode: Encountered exception during format:
java.io.IOException: Cannot remove current directory: /data/hadoop/dfs/name/current
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:332)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:546)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:567)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:148)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:899)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1310)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1424)
15/01/21 05:21:17 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Cannot remove current directory: /data/hadoop/dfs/name/current
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:332)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:546)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:567)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:148)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:899)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1310)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1424)
启动失败解决2:jps显示的结果是5432 Jps
主要原因是在配置hadoop是使用的root用户,然而格式化测试使用的是hadoop用户,所以出现以上的问题!
解决办法是将hadoop的安装目录的owner更改成hadoop,然后重新格式化就可以了
具体操作是chown -R hadoop /usr/local/Hadoop-2.6.0
注释:如果去掉-R就是更改一个文件或者文件夹,如果有-R就把整个文件目录下的文件都更改过来了
启动失败报错3:WARNorg.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting
to server:hadoop1/192.168.10.22:9000
2014-05-04 10:43:55,009 INFOorg.apache.hadoop.ipc.Client: Retrying connect to server:hadoop1/192.168.10.22:9000. Already tried 0 time(s); retry policy
isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-05-04 10:43:56,012 INFOorg.apache.hadoop.ipc.Client: Retrying connect to server:hadoop1/192.168.10.22:9000. Already tried 1 time(s); retry policy
isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
启动失败解决3:
如果你已经检查了datanode以及namenode的连通性,ip都正确,但是仍然没法解决问题,那么可以接着往下看了。
其实根本原因还是无法连接到192.168.100.187:9000 相应ip的相应端口。 查看一下namenode的端口坚挺情况: