1、背景
Ceph集群中本来有9个OSD,近期出现空间不足的情况。随机增加5个OSD到集群,加入集群后进入recovery和backfill状态,此过程需要很长的时间。通过ceph -s看到recovery速度仅为18 objects/s。因此需要对recovery速度进行优化。
]$ ceph -scluster:id: 5adf323c-bef2-42b4-8eff-7a164be1c7fahealth: HEALTH_WARNDegraded data redundancy: 2246888/205599948 objects degraded (1.093%), 10 pgs degraded, 10 pgs undersizedservices:mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11m)mgr: ceph-mon1(active, since 6d), standbys: ceph-mon2mds: cephfs:1 {0=ceph-mon2=up:active} 1 up:standbyosd: 14 osds: 14 up (since 33h), 14 in (since 41h); 114 remapped pgsrgw: 3 daemons active (ceph-mon1, ceph-mon2, ceph-mon3)task status:data:pools: 8 pools, 352 pgsobjects: 34.27M objects, 2.9 TiBusage: 5.4 TiB used, 7.3 TiB / 13 TiB availpgs: 2246888/205599948 objects degraded (1.093%)77284550/205599948 objects misplaced (37.590%)238 active+clean96 active+remapped+backfill_wait10 active+undersized+degraded+remapped+backfilling8 active+remapped+backfillingio:client: 1.1 MiB/s rd, 14 op/s rd, 0 op/s wrrecovery: 1.3 MiB/s, 18 objects/sprogress:Rebalancing after osd.13 marked in[=============================.]Rebalancing after osd.10 marked in[=======================.......]Rebalancing after osd.11 marked in[===========================...]Rebalancing after osd.12 marked in[===========================...]Rebalancing after osd.9 marked in[==================............]
2、参数查看和调整
2.1、查看recovery和backfill相关参数
]$ ceph daemon osd.0 config show |grep -E 'backfill|recovery'
"bluefs_replay_recovery": "false",
"bluefs_replay_recovery_disable_compact": "false",
"mon_osd_backfillfull_ratio": "0.900000",
"osd_allow_recovery_below_min_size": "true",
"osd_async_recovery_min_cost": "100",
"osd_backfill_retry_interval": "30.000000",
"osd_backfill_scan_max": "512",
"osd_backfill_scan_min": "64",
"osd_debug_pretend_recovery_active": "false",
"osd_debug_reject_backfill_probability": "0.000000",
"osd_debug_skip_full_check_in_backfill_reservation": "false",
"osd_debug_skip_full_check_in_recovery": "false",
"osd_force_recovery_pg_log_entries_factor": "1.300000",
"osd_kill_backfill_at": "0",
"osd_max_backfills": "1",
"osd_min_recovery_priority": "0",
"osd_recovery_cost": "20971520",
"osd_recovery_delay_start": "0.000000",
"osd_recovery_max_active": "3",
"osd_recovery_max_chunk": "8388608",
"osd_recovery_max_omap_entries_per_chunk": "8096",
"osd_recovery_max_single_start": "1",
"osd_recovery_op_priority": "3",
"osd_recovery_op_warn_multiple": "16",
"osd_recovery_priority": "5",
"osd_recovery_retry_interval": "30.000000",
"osd_recovery_sleep": "0.000000",
"osd_recovery_sleep_hdd": "0.100000",
"osd_recovery_sleep_hybrid": "0.025000",
"osd_recovery_sleep_ssd": "0.100000",
"osd_repair_during_recovery": "false",
"osd_scrub_during_recovery": "false",
2.2、参数释义
- osd-max-backfills:
- 单个 OSD 允许的最大回填操作数。
- 默认:1
- osd_recovery_max_active
- 每个osd的最大请求数
- 默认:3
- osd_recovery_sleep_hdd
- luminous版本后出现,以前版本是osd_recovery_sleep, 每次recovery or backfill 的休眠时长
- 默认:0.1
- osd_backfill_scan_min
- 每次backfill是最小扫描对象数
- 默认:64
- osd_backfill_scan_max
- 每次backfill是最大扫描对象数
- 默认:512
2.3、参数调整
调整要根据具体集群情况而定,测试验证出最佳的参数,盲目调整参数性能可能不升反降,甚至影响正常读写请求。osd max backfills = 10 # 8或者10,设置过大会导致slow query osd recovery max active = 15 osd_recovery_sleep_hdd = 0 osd_recovery_sleep_ssd = 02.4、调整指令
]$ ceph tell osd.\* injectargs '--osd_max_backfills=10' ]$ ceph tell osd.\* injectargs '--osd_recovery_max_active=15' ]$ ceph tell osd.\* injectargs '--osd_recovery_sleep_hdd=0' ]$ ceph tell osd.\* injectargs '--osd_recovery_sleep_ssd=0'3、查看调整后的参数和recovery速度
```bash ]$ ceph daemon osd.0 config show |grep -E ‘backfill|recovery’ “bluefs_replay_recovery”: “false”, “bluefs_replay_recovery_disable_compact”: “false”, “mon_osd_backfillfull_ratio”: “0.900000”, “osd_allow_recovery_below_min_size”: “true”, “osd_async_recovery_min_cost”: “100”, “osd_backfill_retry_interval”: “30.000000”, “osd_backfill_scan_max”: “512”, “osd_backfill_scan_min”: “64”, “osd_debug_pretend_recovery_active”: “false”, “osd_debug_reject_backfill_probability”: “0.000000”, “osd_debug_skip_full_check_in_backfill_reservation”: “false”, “osd_debug_skip_full_check_in_recovery”: “false”, “osd_force_recovery_pg_log_entries_factor”: “1.300000”, “osd_kill_backfill_at”: “0”, “osd_max_backfills”: “10”, “osd_min_recovery_priority”: “0”, “osd_recovery_cost”: “20971520”, “osd_recovery_delay_start”: “0.000000”, “osd_recovery_max_active”: “15”, “osd_recovery_max_chunk”: “8388608”, “osd_recovery_max_omap_entries_per_chunk”: “8096”, “osd_recovery_max_single_start”: “1”, “osd_recovery_op_priority”: “3”, “osd_recovery_op_warn_multiple”: “16”, “osd_recovery_priority”: “5”, “osd_recovery_retry_interval”: “30.000000”, “osd_recovery_sleep”: “0.000000”, “osd_recovery_sleep_hdd”: “0.000000”, “osd_recovery_sleep_hybrid”: “0.025000”, “osd_recovery_sleep_ssd”: “0.000000”, “osd_repair_during_recovery”: “false”, “osd_scrub_during_recovery”: “false”,
]$ ceph -s cluster: id: 5adf323c-bef2-42b4-8eff-7a164be1c7fa health: HEALTH_WARN Degraded data redundancy: 1862088/205599948 objects degraded (0.906%), 10 pgs degraded, 10 pgs undersized
services: mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 51m) mgr: ceph-mon1(active, since 6d), standbys: ceph-mon2 mds: cephfs:1 {0=ceph-mon2=up:active} 1 up:standby osd: 14 osds: 14 up (since 34h), 14 in (since 42h); 114 remapped pgs rgw: 3 daemons active (ceph-mon1, ceph-mon2, ceph-mon3)
task status:
data: pools: 8 pools, 352 pgs objects: 34.27M objects, 2.9 TiB usage: 5.4 TiB used, 7.3 TiB / 13 TiB avail pgs: 1862088/205599948 objects degraded (0.906%) 75829821/205599948 objects misplaced (36.882%) 238 active+clean 96 active+remapped+backfill_wait 10 active+undersized+degraded+remapped+backfilling 8 active+remapped+backfilling
io: client: 435 KiB/s rd, 11 op/s rd, 0 op/s wr recovery: 41 MiB/s, 460 objects/s
progress: Rebalancing after osd.13 marked in [=============================.] Rebalancing after osd.10 marked in [=======================…….] Rebalancing after osd.11 marked in [===========================…] Rebalancing after osd.12 marked in [===========================…] Rebalancing after osd.9 marked in [==================…………]
``` 可以看到恢复速度已经达到460 objects/s
