date: 2020-11-17title: NameNode故障恢复 #标题
tags: hadoop #标签
categories: Hadoop # 分类
将secondarynamenode中数据拷贝到namenode存储数据的目录
模拟namenode故障
$ jps | grep -w NameNode | awk '{print $1}' | xargs kill -9 # 停止namenode
$ rm -rf data/tmp/dfs/name/* # 删除namenode的数据
拷贝secondarnamenode的数据
# 将secondarynamenode的数据拷贝到namenode的name目录中(以下操作在namenode上执行)
$ rsync -az 192.168.20.4:/apps/usr/hadoop-2.9.2/data/tmp/dfs/namesecondary/ /apps/usr/hadoop-2.9.2/data/tmp/dfs/name/
$ hadoop-daemon.sh start namenode # 单独启动namenode
访问namenode的50070端口,数据正常,如下:
使用-importCheckpoint选项启动namenode守护进程,从而将secondarynamenode中数据拷贝到namenode目录中
修改hdfs-site.xml
<configuration> # 添加如下配置
<!--指定多长时间检查下操作次数,时间短一些,设置为60s-->
<property>
<name>dfs.namenode.checkpoint.check.period</name>
<value>60</value>
</property>
<!--指定namenode的数据目录-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/apps/usr/hadoop-2.9.2/data/tmp/dfs/name</value>
</property>
模拟namenode故障
$ jps | grep -w NameNode | awk '{print $1}' | xargs kill -9 # 停止namenode
$ rm -rf data/tmp/dfs/name/* # 删除namenode的数据
如果secondarynamenode不和namenode在同一个主机上,需要将secondarynamenode存储数据的目录拷贝到namenode存储数据的平级目录,并删除in_use.lock文件
# 以下操作在namenode上执行
$ rsync -az 192.168.20.4:/apps/usr/hadoop-2.9.2/data/tmp/dfs/namesecondary /apps/usr/hadoop-2.9.2/data/tmp/dfs/
$ pwd # 确认当前目录
/apps/usr/hadoop-2.9.2/data/tmp/dfs
$ ls # 确认有以下目录
data name namesecondary
$ rm -f namesecondary/in_use.lock # 删除secondarynamenode目录中的锁文件
$ hdfs namenode -importCheckpoint # 执行此命令,最后会输出如下(低版本的可能不会有输出,只能多等待一会了)
20/11/17 07:39:51 INFO hdfs.StateChange: STATE* Safe mode ON, in safe mode extension.
The reported blocks 18 has reached the threshold 0.9990 of total blocks 18. The number of live datanodes 3 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 9 seconds.
20/11/17 07:40:01 INFO hdfs.StateChange: STATE* Safe mode is OFF
20/11/17 07:40:01 INFO hdfs.StateChange: STATE* Leaving safe mode after 30 secs
20/11/17 07:40:01 INFO hdfs.StateChange: STATE* Network topology has 1 racks and 3 datanodes
20/11/17 07:40:01 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
# 执行上面的指令后,你再开一个终端,会发现namenode已经在监听50070了,
# 大概等两分钟后,即可ctrl+c终端此指令,然后正常启动namenode即可,数据已经恢复了。
$ hadoop-daemon.sh start namenode # 启动namenode
# 测试hadoop可用性
$ hadoop fs -mkdir /aaa # 创建目录
$ hadoop fs -ls / # 查看目录
Found 4 items
drwxr-xr-x - root supergroup 0 2020-11-17 06:25 /a
drwxr-xr-x - root supergroup 0 2020-11-17 07:45 /aaa
drwxrwx--- - root supergroup 0 2020-11-12 22:21 /tmp
drwxr-xr-x - root supergroup 0 2020-11-12 22:18 /user