1、背景说明

随着Ceph存储数据越来越多,通过ceph osd df tree会发现数据不均衡的情况,而数据不均衡会导致存储空间的浪费,特别在存储池水位达到80%之后会常出现不均衡的情况,这种情况可以通过人工调整pg分布来达到数据均衡。
从Luminous之后,OSD Map会记录一个新的pg-upmap异常表,允许集群将特定的pg以声明式的方式映射到特定的OSD。

2、现状

通过ceph osd df看到osd.7存储空间比其他OSD少了近10%。

  1. ]$ ceph osd df
  2. ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
  3. 0 ssd 0.90959 1.00000 931 GiB 572 GiB 536 GiB 3.7 GiB 33 GiB 359 GiB 61.42 0.97 152 up
  4. 3 ssd 0.90959 1.00000 931 GiB 578 GiB 542 GiB 3.0 GiB 33 GiB 354 GiB 62.01 0.98 152 up
  5. 6 ssd 0.90959 1.00000 931 GiB 612 GiB 574 GiB 3.0 GiB 35 GiB 319 GiB 65.70 1.04 165 up
  6. 1 ssd 0.90959 1.00000 931 GiB 613 GiB 574 GiB 3.4 GiB 35 GiB 319 GiB 65.76 1.04 180 up
  7. 4 ssd 0.90959 1.00000 931 GiB 639 GiB 600 GiB 3.1 GiB 37 GiB 292 GiB 68.64 1.09 158 up
  8. 7 ssd 0.90959 1.00000 931 GiB 510 GiB 478 GiB 3.1 GiB 29 GiB 422 GiB 54.74 0.87 132 up
  9. 2 ssd 0.90959 1.00000 931 GiB 584 GiB 548 GiB 2.1 GiB 33 GiB 348 GiB 62.66 0.99 142 up
  10. 5 ssd 0.90959 1.00000 931 GiB 591 GiB 555 GiB 2.7 GiB 34 GiB 340 GiB 63.48 1.01 160 up
  11. 8 ssd 0.90959 1.00000 931 GiB 587 GiB 548 GiB 4.8 GiB 33 GiB 345 GiB 62.99 1.00 167 up
  12. TOTAL 8.2 TiB 5.2 TiB 4.8 TiB 29 GiB 302 GiB 3.0 TiB 63.04

则说明存在PG不均衡的情况,超过85%时集群将会为ERROR状态。

3、手工均衡

3.1、检查Ceph版本

使用此方案均衡PG要求Ceph版本在luminous或之后版本。

ceph features
{
    "mon": [
        {
            "features": "0x3ffddff8ffecffff",
            "release": "luminous",
            "num": 3
        }
    ],
}

3.2、获取集群当前的OSD Map信息

]$ ceph osd getmap -o osd.map
got osdmap epoch 552

3.3、获取待均衡的优化信息

]$ osdmaptool osd.map --upmap out.txt --upmap-pool default.rgw.buckets.data --upmap-max=10
osdmaptool: osdmap file 'osd.map'
writing upmap command output to: out.txt
checking for upmap cleanups
upmap, max-count 10, max deviation 5
 limiting to pools default.rgw.buckets.data ([48])
pools default.rgw.buckets.data 
prepared 10/10 changes

]$ cat out.txt 
ceph osd pg-upmap-items 48.1 4 7    # 表示pg 48.1 将从osd.4重新映射到osd.7
ceph osd pg-upmap-items 48.1e 1 7
ceph osd pg-upmap-items 48.27 6 0
ceph osd pg-upmap-items 48.2b 6 0
ceph osd pg-upmap-items 48.2d 1 7
ceph osd pg-upmap-items 48.33 4 7
ceph osd pg-upmap-items 48.49 6 3
ceph osd pg-upmap-items 48.57 4 7
ceph osd pg-upmap-items 48.69 1 7
ceph osd pg-upmap-items 48.75 1 7
  • upmap-pool :指定需要优化均衡的存储池名
  • upmap-max: 指定一次优化的数据条目,默认100,可根据环境业务情况调整该值,一次调整的条目越多,数据迁移会越多,可能对环境业务造成影响。
  • max-deviation:最大偏差值,默认为0.01(即1%)。如果OSD利用率与平均值之间的差异小于此值,则将被视为完美。

    3.4、执行均衡操作

    运行数据分布调整操作,这一步将会使集群开始进行PG重新映射,同时集群数据开始迁移均衡

    ]$ source out.txt 
    set 48.1 pg_upmap_items mapping to [4->7]
    set 48.1e pg_upmap_items mapping to [1->7]
    set 48.27 pg_upmap_items mapping to [6->0]
    set 48.2b pg_upmap_items mapping to [6->0]
    set 48.2d pg_upmap_items mapping to [1->7]
    set 48.33 pg_upmap_items mapping to [4->7]
    set 48.49 pg_upmap_items mapping to [6->3]
    set 48.57 pg_upmap_items mapping to [4->7]
    set 48.69 pg_upmap_items mapping to [1->7]
    set 48.75 pg_upmap_items mapping to [1->7]
    

    3.5、查看均衡进度

    ]$ ceph -s
    cluster:
      id:     5adf323c-bef2-42b4-8eff-7a164be1c7fa
      health: HEALTH_OK
    
    services:
      mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 2d)
      mgr: ceph-mon1(active, since 2d), standbys: ceph-mon2
      mds: cephfs:1 {0=ceph-mon3=up:active} 1 up:standby
      osd: 9 osds: 9 up (since 2d), 9 in (since 2d); 10 remapped pgs
      rgw: 3 daemons active (ceph-mon1, ceph-mon2, ceph-mon3)
    
    task status:
    
    data:
      pools:   8 pools, 352 pgs
      objects: 34.27M objects, 2.9 TiB
      usage:   5.2 TiB used, 3.0 TiB / 8.2 TiB avail
      pgs:     0.568% pgs not active
               2143548/205608360 objects misplaced (1.043%)
               342 active+clean
               5   active+remapped+backfill_wait   
               3   active+remapped+backfilling   # 5个pg进入均衡状态
               2   remapped+peering
    
    io:
      client:   0 B/s wr, 0 op/s rd, 3 op/s wr
      recovery: 1.0 MiB/s, 22 objects/s
    
  • 查看均衡后的osd使用率