如果集群状态是HEALTH_ERR 并且有pgs inconsistent,需要进行如下操作:
- 通过下面的命令查看哪些pg状态不一致:
content_copy
ceph pg dump|grep inconsistent
- 根据输出的pg id(如:1.23)进行一致性检查:
content_copy
[root@node3 ~]# ceph pg scrub 1.23
instructing pg 1.23 on osd.5 to scrub
或者,进行深度的一致性检查:
content_copy
[root@node3 ~]# ceph pg deep-scrub 1.23
instructing pg 1.23 on osd.5 to deep-scrub
- 最后修复该pg:
content_copy
[root@node3 ~]# ceph pg repair 1.23
instructing pg 1.23 on osd.5 to repair
把所有不一致的pg修复完成后,最后确认集群状态
4. 确认集群状态:
content_copy
[root@node3 ~]# ceph -s
cluster:
id: b8b4aa68-d825-43e9-a60a-781c92fec20e
health: HEALTH_OK
services:
mon: 1 daemons, quorum node1
mgr: node1(active)
osd: 6 osds: 6 up, 6 in
data:
pools: 1 pools, 64 pgs
objects: 0 objects, 0 bytes
usage: 6368 MB used, 55071 MB / 61440 MB avail
pgs: 64 active+clean