如果集群状态是HEALTH_ERR 并且有pgs inconsistent,需要进行如下操作:

    1. 通过下面的命令查看哪些pg状态不一致:

    content_copy

    1. ceph pg dump|grep inconsistent
    1. 根据输出的pg id(如:1.23)进行一致性检查:

    content_copy

    1. [root@node3 ~]# ceph pg scrub 1.23
    2. instructing pg 1.23 on osd.5 to scrub

    或者,进行深度的一致性检查:
    content_copy

    1. [root@node3 ~]# ceph pg deep-scrub 1.23
    2. instructing pg 1.23 on osd.5 to deep-scrub
    1. 最后修复该pg:

    content_copy

    1. [root@node3 ~]# ceph pg repair 1.23
    2. instructing pg 1.23 on osd.5 to repair

    把所有不一致的pg修复完成后,最后确认集群状态
    4. 确认集群状态:
    content_copy

    1. [root@node3 ~]# ceph -s
    2. cluster:
    3. id: b8b4aa68-d825-43e9-a60a-781c92fec20e
    4. health: HEALTH_OK
    5. services:
    6. mon: 1 daemons, quorum node1
    7. mgr: node1(active)
    8. osd: 6 osds: 6 up, 6 in
    9. data:
    10. pools: 1 pools, 64 pgs
    11. objects: 0 objects, 0 bytes
    12. usage: 6368 MB used, 55071 MB / 61440 MB avail
    13. pgs: 64 active+clean