1、高可用验证
Ceph集群所有节点都可以横向扩展,不存在单点故障,因此建议对所有节点都进行高可用部署。本次使用以下环境验证集群的高可用。
| 角色 | 主机名 | Cluster Network | Public Network | OSD |
|---|---|---|---|---|
| admin、client | ceph-admin.yull.cc | 10.37.129.14 | 10.10.5.25 | |
| mon、osd、mgr、mds | ceph-mon1.yull.cc | 10.37.129.12 | 10.10.5.27 | sdb、sdc |
| mon、osd、mgr、rgw | ceph-mon2.yull.cc | 10.37.129.11 | 10.10.5.28 | sdb、sdc |
| mon、osd、rgw | ceph-mon3.yull.cc | 10.37.129.13 | 10.10.5.29 | sdb、sdc |
| 新增节点mon、osd | ceph-stor3.yull.cc | 10.37.129.14 | 10.10.5.177 | sdb、sdc |
现有集群状态:
]$ ceph -scluster:id: 817bb5ff-1b92-4734-baa8-f21ede4cb9c2health: HEALTH_OKservices:mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 46m)mgr: ceph-mon1(active, since 36m), standbys: ceph-mon2osd: 6 osds: 6 up (since 16m), 6 in (since 16m)data:pools: 1 pools, 16 pgsobjects: 1 objects, 23 Busage: 6.0 GiB used, 378 GiB / 384 GiB availpgs: 16 active+clean
测试功能点:
初始化新增节点环境
]$ */1 * * * * ntpdate cn.pool.ntp.org # 添加时间同步]$ cat /etc/hosts # 配置hosts10.10.5.177 ceph-mon4.stbchina.cn ceph-mon4 ceph-stor4.stbchina.cn ceph-stor4]$ systemctl stop firewalld.service # 停止防火墙]$ systemctl disable firewalld.service # 禁用启动]$ sed -i 's@^\(SELINUX=\).*@\1disabled@' /etc/sysconfig/selinux # 永久关闭]$ setenforce 0 # 临时关闭]$ useradd cephadm && echo "1234qwer" | passwd --stdin cephadm # 新增cephadm用户]$ echo "cephadm ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/cephadm # 添加sudo权限]$ chmod 0440 /etc/sudoers.d/cephadm
添加ceph-deploy主机到新增节点的ssh免密,这里不在赘述。
安装ceph基础环境
]$ rpm -ivh https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-1.el7.noarch.rpm]$ yum install ceph
拷贝ceph配置文件到新增节点
]$ ceph-deploy admin ceph-stor4
新增节点
]$ ceph-deploy disk list ceph-stor4[ceph-stor1][INFO ] Disk /dev/sdb: 68.7 GB, 68719476736 bytes, 134217728 sectors[ceph-stor1][INFO ] Disk /dev/sdc: 68.7 GB, 68719476736 bytes, 134217728 sectors]$ ceph-deploy disk zap ceph-stor4 /dev/sdb]$ ceph-deploy disk zap ceph-stor4 /dev/sdc]$ ceph-deploy osd create ceph-stor4 --data /dev/sdb]$ ceph-deploy osd create ceph-stor4 --data /dev/sdc
查看集群状态
当新的节点加入集群,ceph集群开始将部分现有的数据重新平衡到新加入的OSD上,用下面的命令可用观察平衡过程。
- 新OSD加入集群后,CRUSH将会为新添加的OSD分配PG,强制OSD接受重新分配的PG并把一定数量的负载转移到新OSD中。
- 此过程会进入active+remapped+backfilling状态。
- 在backfilling状态操作期间,会看到多种状态:
- backfill_wait表示 backfill操作挂起, 但 backfill 操作还没有开始 ( PG 正在等待开始回填操作 )
- backfill表示backfill操作正在执行
- backfill_too_full 表示在请求 backfill 操作, 由于存储能力问题, 但不可以完成,
回填操作在新节点加入后会立即进行,这样会影响集群性能,可以通过以下方法进行回填操作的调整。
- osd_max_backfills设定最大数量并发backfills到一个 OSD, 默认10
- osd backfill full ratio 当 osd 达到负载, 允许 OSD 拒绝 backfill 请求, 默认 85%,
- 假如 OSD 拒绝 backfill 请求, osd backfill retry interval 将会生效, 默认 10 秒后重试
- osd backfill scan min , osd backfill scan max 管理检测时间间隔 ```bash ]$ ceph -s cluster: id: 597abcf5-e6ce-4c68-b47b-ec4e33d66694 health: HEALTH_OK
services: mon: 4 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3,ceph-stor4 (age 12h) mgr: ceph-mon1(active, since 4d), standbys: ceph-mon2 osd: 12 osds: 12 up (since 9m), 12 in (since 11h); 28 remapped pgs rgw: 3 daemons active (ceph-mon1, ceph-mon2, ceph-mon3)
task status:
data: pools: 7 pools, 256 pgs objects: 2.15M objects, 794 GiB usage: 1.7 TiB used, 8.8 TiB / 10 TiB avail pgs: 2141189/12872880 objects misplaced (16.633%)
228 active+clean27 active+remapped+backfill_wait1 active+remapped+backfilling
io: recovery: 2.2 MiB/s, 5 objects/s ]$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 9.46489 root default
-3 3.00000 host ceph-mon1
0 hdd 1.00000 osd.0 up 1.00000 1.00000 1 hdd 1.00000 osd.1 up 1.00000 1.00000 2 hdd 1.00000 osd.2 up 1.00000 1.00000 -5 3.00000 host ceph-mon2
3 hdd 1.00000 osd.3 up 1.00000 1.00000 4 hdd 1.00000 osd.4 up 1.00000 1.00000 5 hdd 1.00000 osd.5 up 1.00000 1.00000 -7 2.00000 host ceph-mon3
6 hdd 1.00000 osd.6 up 1.00000 1.00000
7 hdd 1.00000 osd.7 up 1.00000 1.00000 8 hdd 1.00000 osd.8 up 1.00000 1.00000 -9 1.46489 host ceph-stor4
9 hdd 0.48830 osd.9 up 1.00000 1.00000 10 hdd 0.48830 osd.10 up 1.00000 1.00000 11 hdd 0.48830 osd.11 up 1.00000 1.00000 ```
当PG卡住active+remapped+backfill_wait状态
将osd移出集群,本次移出osd.6
]$ ceph osd out osd.6 marked out osd.6.停止osd进程
]$ systemctl stop ceph-osd@6查看OSD状态
]$ ceph osd tree ...... 6 hdd 1.00000 osd.6 down 0 0 ......删除OSD,从CRUSH中移除并删除auth认证
]$ ceph osd rm osd.6 removed osd.6 ]$ ceph osd crush rm osd.6 # 从运行图中删除 removed item id 6 name 'osd.6' from crush map ]$ ceph auth del osd.6 updated ]$ ceph osd remove osd.6 # 从运行图中清除此时集群会有部分PG进入degraded状态,集群会进行相关OSD数据的迁移,待迁移完成,集群重新回到 active+clean状态。
1.1.2、单存储节点宕机并持续写入数据
将存储节点直接关机,观察集群状态
]$ ceph -s cluster: id: 597abcf5-e6ce-4c68-b47b-ec4e33d66694 health: HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg incomplete Degraded data redundancy: 3866506/12872892 objects degraded (30.036%), 96 pgs degraded, 96 pgs undersized 1/4 mons down, quorum ceph-mon1,ceph-mon2,ceph-stor4 services: mon: 4 daemons, quorum ceph-mon1,ceph-mon2,ceph-stor4 (age 7m), out of quorum: ceph-mon3 mgr: ceph-mon1(active, since 7m), standbys: ceph-mon2 osd: 9 osds: 9 up (since 6m), 9 in (since 59m); 98 remapped pgs rgw: 2 daemons active (ceph-mon1, ceph-mon2) task status: data: pools: 7 pools, 256 pgs objects: 2.15M objects, 794 GiB usage: 1.1 TiB used, 6.3 TiB / 7.5 TiB avail pgs: 0.391% pgs not active 3866506/12872892 objects degraded (30.036%) 2287935/12872892 objects misplaced (17.773%) 157 active+clean 94 active+undersized+degraded+remapped+backfill_wait 2 active+undersized+degraded+remapped+backfilling 2 active+remapped+backfill_wait 1 incomplete io: recovery: 4.2 MiB/s, 13 objects/s1.1.3、双存储节点宕机
1.2、模拟MON故障
1.3、模拟主机宕机
1.4、模拟RGW节点故障
1.5、PG状态故障模拟
2、性能测试
2.1、物理机
| block | 读写顺序 | 读写数据 | 线程数 | IOPS | 带宽速度 | 对象IO |
|---|---|---|---|---|---|---|
| 4K | 随机写 | 4G | 64 | 1312 | 5252KiB/s | 1.68k op/s wr |
| 4K | 随机读 | 4G | 64 | 54.6k | 213MiB/s | 7.46k op/s rd |
| 512K | 顺序写 | 4G | 64 | 16 | 8704KiB/s | 27 op/s wr |
| 512K | 顺序读 | 4G | 64 | 20.5k | 10.0GiB/s | 327 op/s rd |
| 1M | 顺序读 | 4G | 16 | 10.8k | 10.6GiB/s | 180 op/s rd |
| 4M | 顺序写 | 4G | 16 | 3158 | 12.3GiB/s | 57 op/s rd |
2.2、虚拟机
| block | 读写顺序 | 读写数据 | 线程数 | IOPS | 带宽速度 | 对象IO |
|---|---|---|---|---|---|---|
| 4K | 随机写 | 4G | 64 | 1427 | 5709KiB/s | 2.00k op/s wr |
| 4K | 随机读 | 4G | 64 | 35.4k | 138MiB/s | 4.74k op/s rd |
| 4K | 顺序读 | 4G | 64 | 115k | 448MiB/s | 11.3k op/s rd |
| 512K | 顺序写 | 4G | 64 | 199 | 99.8MiB/s | 50 op/s wr |
| 512K | 顺序读 | 4G | 64 | 2344 | 1172MiB/s | 60 op/s rd |
| 2M | 顺序读 | 4G | 16 | 55 | 111MiB/s | 58 op/s wr |
| 4M | 顺序写 | 4G | 16 | 29 | 109MiB/s | 52 op/s wr |
2.3、SSD
| block | 读写顺序 | 读写数据 | 线程数 | IOPS | 带宽速度 | 对象IO |
|---|---|---|---|---|---|---|
| 64K | 顺序写 | 2G | 32 | 1353 | 84.6MiB/s | |
| 64K | 顺序读 | 2G | 32 | 3930 | 246MiB/s | |
| 1M | 顺序写 | 2G | 32 | 142 | 143MiB/s | |
| 1M | 顺序读 | 2G | 32 | 258 | 258MiB/s | |
| 2M | 顺序读 | 4G | 16 | 129 | 260MiB/s |
