Ceph - 9.解决Ceph数据分布不均衡 - 《运维机器人》

1、背景说明
2、现状
3、手工均衡

1、背景说明

随着Ceph存储数据越来越多，通过ceph osd df tree会发现数据不均衡的情况，而数据不均衡会导致存储空间的浪费，特别在存储池水位达到80%之后会常出现不均衡的情况，这种情况可以通过人工调整pg分布来达到数据均衡。
从Luminous之后，OSD Map会记录一个新的pg-upmap异常表，允许集群将特定的pg以声明式的方式映射到特定的OSD。

2、现状

通过ceph osd df看到osd.7存储空间比其他OSD少了近10%。

]$ ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META    AVAIL   %USE  VAR  PGS STATUS 
 0   ssd 0.90959  1.00000 931 GiB 572 GiB 536 GiB 3.7 GiB  33 GiB 359 GiB 61.42 0.97 152     up 
 3   ssd 0.90959  1.00000 931 GiB 578 GiB 542 GiB 3.0 GiB  33 GiB 354 GiB 62.01 0.98 152     up 
 6   ssd 0.90959  1.00000 931 GiB 612 GiB 574 GiB 3.0 GiB  35 GiB 319 GiB 65.70 1.04 165     up 
 1   ssd 0.90959  1.00000 931 GiB 613 GiB 574 GiB 3.4 GiB  35 GiB 319 GiB 65.76 1.04 180     up 
 4   ssd 0.90959  1.00000 931 GiB 639 GiB 600 GiB 3.1 GiB  37 GiB 292 GiB 68.64 1.09 158     up 
 7   ssd 0.90959  1.00000 931 GiB 510 GiB 478 GiB 3.1 GiB  29 GiB 422 GiB 54.74 0.87 132     up 
 2   ssd 0.90959  1.00000 931 GiB 584 GiB 548 GiB 2.1 GiB  33 GiB 348 GiB 62.66 0.99 142     up 
 5   ssd 0.90959  1.00000 931 GiB 591 GiB 555 GiB 2.7 GiB  34 GiB 340 GiB 63.48 1.01 160     up 
 8   ssd 0.90959  1.00000 931 GiB 587 GiB 548 GiB 4.8 GiB  33 GiB 345 GiB 62.99 1.00 167     up 
                    TOTAL 8.2 TiB 5.2 TiB 4.8 TiB  29 GiB 302 GiB 3.0 TiB 63.04

则说明存在PG不均衡的情况，超过85%时集群将会为ERROR状态。

3、手工均衡

3.1、检查Ceph版本

使用此方案均衡PG要求Ceph版本在luminous或之后版本。

ceph features
{
    "mon": [
        {
            "features": "0x3ffddff8ffecffff",
            "release": "luminous",
            "num": 3
        }
    ],
}

3.2、获取集群当前的OSD Map信息

]$ ceph osd getmap -o osd.map
got osdmap epoch 552

3.3、获取待均衡的优化信息

]$ osdmaptool osd.map --upmap out.txt --upmap-pool default.rgw.buckets.data --upmap-max=10
osdmaptool: osdmap file 'osd.map'
writing upmap command output to: out.txt
checking for upmap cleanups
upmap, max-count 10, max deviation 5
 limiting to pools default.rgw.buckets.data ([48])
pools default.rgw.buckets.data 
prepared 10/10 changes

]$ cat out.txt 
ceph osd pg-upmap-items 48.1 4 7    # 表示pg 48.1 将从osd.4重新映射到osd.7
ceph osd pg-upmap-items 48.1e 1 7
ceph osd pg-upmap-items 48.27 6 0
ceph osd pg-upmap-items 48.2b 6 0
ceph osd pg-upmap-items 48.2d 1 7
ceph osd pg-upmap-items 48.33 4 7
ceph osd pg-upmap-items 48.49 6 3
ceph osd pg-upmap-items 48.57 4 7
ceph osd pg-upmap-items 48.69 1 7
ceph osd pg-upmap-items 48.75 1 7

upmap-pool ：指定需要优化均衡的存储池名
upmap-max：指定一次优化的数据条目，默认100，可根据环境业务情况调整该值，一次调整的条目越多，数据迁移会越多，可能对环境业务造成影响。

max-deviation：最大偏差值，默认为0.01（即1％）。如果OSD利用率与平均值之间的差异小于此值，则将被视为完美。

3.4、执行均衡操作

运行数据分布调整操作，这一步将会使集群开始进行PG重新映射，同时集群数据开始迁移均衡

]$ source out.txt 
set 48.1 pg_upmap_items mapping to [4->7]
set 48.1e pg_upmap_items mapping to [1->7]
set 48.27 pg_upmap_items mapping to [6->0]
set 48.2b pg_upmap_items mapping to [6->0]
set 48.2d pg_upmap_items mapping to [1->7]
set 48.33 pg_upmap_items mapping to [4->7]
set 48.49 pg_upmap_items mapping to [6->3]
set 48.57 pg_upmap_items mapping to [4->7]
set 48.69 pg_upmap_items mapping to [1->7]
set 48.75 pg_upmap_items mapping to [1->7]

3.5、查看均衡进度

]$ ceph -s
cluster:
  id:     5adf323c-bef2-42b4-8eff-7a164be1c7fa
  health: HEALTH_OK

services:
  mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 2d)
  mgr: ceph-mon1(active, since 2d), standbys: ceph-mon2
  mds: cephfs:1 {0=ceph-mon3=up:active} 1 up:standby
  osd: 9 osds: 9 up (since 2d), 9 in (since 2d); 10 remapped pgs
  rgw: 3 daemons active (ceph-mon1, ceph-mon2, ceph-mon3)

task status:

data:
  pools:   8 pools, 352 pgs
  objects: 34.27M objects, 2.9 TiB
  usage:   5.2 TiB used, 3.0 TiB / 8.2 TiB avail
  pgs:     0.568% pgs not active
           2143548/205608360 objects misplaced (1.043%)
           342 active+clean
           5   active+remapped+backfill_wait   
           3   active+remapped+backfilling   # 5个pg进入均衡状态
           2   remapped+peering

io:
  client:   0 B/s wr, 0 op/s rd, 3 op/s wr
  recovery: 1.0 MiB/s, 22 objects/s

查看均衡后的osd使用率