存储 - ceph集群测试 - 《运维大世界》

date: 2021-04-18title: ceph集群测试 #标题
tags: ceph #标签
categories: 存储 # 分类

date: 2021-04-18title: ceph集群测试 #标题
tags: ceph #标签
categories: 存储 # 分类

本篇文章针对一个ceph集群部署完成，在正式上线之前做一个集群测试。其中并非所有都需要测试，自己把控就好，但一般性能压测是必须的。

monitor服务高可用测试

在之前的部署文档中，monitor是部署了三个节点，那么现在就来测试其一个高可用。

# 登录到对应的节点上停止mon服务，模拟故障
$ systemctl stop ceph-mon@centos-20-5
$ ceph -s        # 查看集群状态
  cluster:
    id:     d94fee92-ef1a-4f1f-80a5-1c7e1caf4a4a
    health: HEALTH_WARN   # 看到已经报警down了一个
            1/3 mons down, quorum centos-20-10,centos-20-6  
  services:
    mon: 3 daemons, quorum centos-20-10,centos-20-6 (age 5s), out of quorum: centos-20-5  # centos-20-5节点已经挂了
    mgr: centos-20-6(active, since 2h), standbys: centos-20-5, centos-20-10
    mds: cephfs-demo:1 {0=centos-20-5=up:active} 2 up:standby
    osd: 6 osds: 6 up (since 2h), 6 in (since 3d)
    rgw: 2 daemons active (centos-20-10, centos-20-5)
  task status:
    scrub status:
        mds.centos-20-5: idle
  data:
    pools:   9 pools, 288 pgs
    objects: 252 objects, 14 MiB
    usage:   6.1 GiB used, 114 GiB / 120 GiB avail
    pgs:     288 active+clean
# 创建块文件进行测试
$ rbd create --size 1G ceph-demo/test-demo
$ rbd -p ceph-demo ls   # 还可以成功创建
rbd-demo.img
test-demo
# 结论：在三节点mon中，挂掉一个mon节点，对集群无任何影响
# 停止第二个    
$ systemctl stop ceph-mon@centos-20-6
# 再次查看ceph的状态，发现已经被hang住了
$ ceph -s
# 创建块文件等操作也将失败

此时查看还存活的mon节点日志，会看到如下输出：

ceph集群测试 - 图1

当停掉的两个mon服务恢复正常后，ceph集群也将恢复正常。mon服务采用paxos算法进行选举，也是采用半数机制，也就是说，三个mon，允许一个mon服务down掉，五个mon，允许两个mon服务down掉，七个mon，允许三个mon服务down掉……以此类推。

mds主从切换

如果你的ceph集群使用文件存储，那么就会有mds服务，默认mds服务是主备的方式进行高可用的，如下：

ceph集群测试 - 图2

上述结果显示当前节点centos-20-5的mds服务为active状态，还有两个备用节点。

# 直接停掉两个mds服务，只保留一个mds服务
$ systemctl stop ceph-mds@centos-20-5
 systemctl stop ceph-mds@centos-20-6


 # 查看集群状态
 $ ceph -s
  cluster:
    id:     d94fee92-ef1a-4f1f-80a5-1c7e1caf4a4a
    health: HEALTH_WARN
            insufficient standby MDS daemons available
 # 只是提示当前没有可用的备用节点
  services:
    mon: 3 daemons, quorum centos-20-10,centos-20-5,centos-20-6 (age 13m)
    mgr: centos-20-6(active, since 3h), standbys: centos-20-5, centos-20-10
    mds: cephfs-demo:1 {0=centos-20-10=up:active}
    osd: 6 osds: 6 up (since 2h), 6 in (since 3d)
    rgw: 2 daemons active (centos-20-10, centos-20-5)

  task status:
    scrub status:
        mds.centos-20-10: idle

  data:
    pools:   9 pools, 288 pgs
    objects: 256 objects, 14 MiB
    usage:   6.1 GiB used, 114 GiB / 120 GiB avail
    pgs:     288 active+clean

# 此时，可以自行挂载ceph的文件存储进行读写操作，还是正常的。

结论：由于ceph中的mds服务是采用的一主多备的方式做的高可用，所以只要你的集群中有一个mds服务还是正常的，那么文件存储就不会受到影响。

RGW高可用测试

在 ceph集群之RGW高可用中，我们针对RGW做了做了高可用，由于借助了负载均衡，所以说，只要你还有一个RGW服务是处于可用状态，那么就不会影响业务正常运行。

# 查看集群状态
$ ceph -s
  cluster:
    id:     d94fee92-ef1a-4f1f-80a5-1c7e1caf4a4a
    health: HEALTH_WARN
            insufficient standby MDS daemons available

  services:
    mon: 3 daemons, quorum centos-20-10,centos-20-5,centos-20-6 (age 22m)
    mgr: centos-20-6(active, since 3h), standbys: centos-20-5, centos-20-10
    mds: cephfs-demo:1 {0=centos-20-10=up:active}
    osd: 6 osds: 6 up (since 2h), 6 in (since 3d)
    rgw: 2 daemons active (centos-20-10, centos-20-5)
 # 有两个rgw守护进程
  task status:
    scrub status:
        mds.centos-20-10: idle


# 停掉任意一个
$ systemctl stop ceph-radosgw.target

# 再次查看
$ ceph -s
  cluster:
    id:     d94fee92-ef1a-4f1f-80a5-1c7e1caf4a4a
    health: HEALTH_WARN
            insufficient standby MDS daemons available

  services:
    mon: 3 daemons, quorum centos-20-10,centos-20-5,centos-20-6 (age 22m)
    mgr: centos-20-6(active, since 3h), standbys: centos-20-5, centos-20-10
    mds: cephfs-demo:1 {0=centos-20-10=up:active}
    osd: 6 osds: 6 up (since 2h), 6 in (since 3d)
    rgw: 1 daemon active (centos-20-5)   # 只剩一个了

# 使用s3cmd命令访问验证
$ grep 192.168 ~/.s3cfg      # 确保s3 配置文件指向的是VIP
host_base = 192.168.20.100:80
host_bucket = 192.168.20.100:80/%(bucket)s


# 创建bucket并测试
$ s3cmd mb s3://s3_test
$ s3cmd ls
2021-04-16 08:04  s3://ceph-s3-bucket
2021-04-17 15:05  s3://s3_test
2021-04-17 13:43  s3://s3_vip_test
2021-04-16 08:10  s3://s3cmd-demo
2021-04-16 08:14  s3://swift-demo

结论：rgw做了高可用后，只要还有一个rgw是处于正常状态，并且客户端指向的是负载均衡的VIP，就不会影响正常业务使用。

ceph坏盘测试

我这里的ceph集群中是三个osd节点，默认情况呢，ceph采用的是三副本的策略进行存储的，并且至少需要有两个副本才可以正常对外提供服务，所以说，如果我这里down掉一个osd节点，ceph集群是不受影响的，如果down掉两个，那么我的ceph集群就无法继续对外提供任何服务了，当然，貌似可以通过修改某些参数，让ceph集群只要有一个副本就可以对外提供服务，这里不展开聊了，有兴趣的话，自行查阅相关资料吧。

fio性能压测评估

现在我们对ceph中的文件存储使用功能fio工具进行读写测试。

安装fio工具并挂载文件存储

# 假设你已经有了文件存储，这里直接挂载
$ mount -t ceph 192.168.20.5:6789,192.168.20.6:6789,192.168.20.10:6789:/ /ceph_fs/ -o name=admin

# 安装fio工具
$ yum -y install fio

# fio参数说明：
filename=/data/fio.img：测试文件名称，通常选择需要测试的盘的data目录。
direct=1：测试过程绕过机器自带的buffer。使测试结果更真实。
iodepth=32：队列深度
rw=randwrite：测试随机写的I/O
rw=randrw 测试随机写和读的I/O
bs=16k 单次io的块文件大小为16k
bsrange=512-2048 同上，提定数据块的大小范围
size=5g 本次的测试文件大小为5g，以每次4k的io进行测试。
numjobs=30 本次的测试线程为30，建议和fio客户端的cpu个数保持一致
runtime=1000 测试时间为1000秒，如果不写则一直将5g文件分4k每次写完为止。
ioengine=libaio：io引擎使用libaio方式
rwmixwrite=30 在混合读写的模式下，写占30%
group_reporting 关于显示结果的，汇总每个进程的信息。
此外
lockmem=1g 只使用1g内存进行测试。
zero_buffers 用0初始化系统buffer。
nrfiles=8 每个进程生成文件的数量。
磁盘读写常用测试点：
1. Read=100% Ramdon=100% rw=randread (100%随机读)
2. Read=100% Sequence=100% rw=read （100%顺序读）
3. Write=100% Sequence=100% rw=write （100%顺序写）
4. Write=100% Ramdon=100% rw=randwrite （100%随机写）
5. Read=70% Sequence=100% rw=rw, rwmixread=70, rwmixwrite=30
（70%顺序读，30%顺序写）
6. Read=70% Ramdon=100% rw=randrw, rwmixread=70, rwmixwrite=30
(70%随机读，30%随机写)

4k随机写

$ fio -filename=/ceph_fs/fio.img -direct=1 -iodepth 64 -thread -rw=randwrite -ioengine=libaio -bs=4k -size=20G -numjobs=8 -runtime=60 -group_reporting -name=mytest

在进行压测时，也可以到对应的osd节点上，使用ceph osd perf指令查看下延迟，或者使用 iostat 指令查看下磁盘IO使用率（iostat指令需要安装sysstat包。）。

查看osd的延迟

$ ceph osd perf
osd commit_latency(ms) apply_latency(ms) 
  5                166               166 
  4                  7                 7 
  0                 22                22 
  1                  7                 7 
  2                167               167 
  3                 10                10

查看某个节点的磁盘IO

$ iostat -x 1  # 主要关注 %util 这一列，表示磁盘使用率
Linux 3.10.0-957.el7.x86_64 (centos-20-5)       2021年04月18日  _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.56    0.00    1.68    0.05    0.00   97.71

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.09    2.49    4.90   131.93   292.46   114.82     0.01    1.03    1.68    0.70   0.58   0.43
sdb               0.00     0.00    0.29   14.02    14.18   110.31    17.39     0.01    0.99    0.42    1.00   0.17   0.25
sdc               0.00     0.00    0.37   12.24    22.37    93.19    18.33     0.01    1.20    0.48    1.22   0.19   0.24
dm-0              0.00     0.00    2.03    4.99   119.66   291.01   117.09     0.01    1.09    1.99    0.72   0.59   0.41
dm-1              0.00     0.00    0.03    0.00     0.96     0.00    59.04     0.00    0.21    0.21    0.00   0.13   0.00
dm-2              0.00     0.00    0.04    0.00     0.50     0.72    56.74     0.00    0.31    0.24    2.50   0.24   0.00
dm-3              0.00     0.00    0.35   12.24    21.95    93.19    18.29     0.02    1.20    0.49    1.22   0.19   0.24
dm-4              0.00     0.00    0.28   14.02    13.76   110.31    17.35     0.01    0.99    0.43    1.00   0.17   0.25

4k随机读

$ fio -filename=/ceph_fs/fio.img -direct=1 -iodepth 64 -thread -rw=randread -ioengine=libaio -bs=4k -size=20G -numjobs=8 -runtime=60 -group_reporting -name=mytest
mytest: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64

4k随机读写

# 执行这个指令需要确保/ceph_fs/fio.img文件已存在
$ fio -filename=/ceph_fs/fio.img -direct=1 -iodepth 64 -thread -rw=randrw -rwmixread=70 -ioengine=libaio -bs=4k -size=20G -numjobs=8 -runtime=60 -group_reporting -name=mytest

# -rwmixread：指定读占70%，写就占30%了。

1M顺序写

$ fio -filename=/ceph_fs/fio.img -direct=1 -iodepth 64 -thread -rw=write -ioengine=libaio -bs=1M -size=20G -numjobs=8 -runtime=60 -group_reporting -name=mytest

RBD bench压测

ceph中自带了针对RBD的压测指令，具体如下：

$ rbd help bench    
usage: rbd bench [--pool <pool>] [--namespace <namespace>] [--image <image>] 
                 [--io-size <io-size>] [--io-threads <io-threads>] 
                 [--io-total <io-total>] [--io-pattern <io-pattern>] 
                 [--rw-mix-read <rw-mix-read>] --io-type <io-type> 
                 <image-spec> 

Simple benchmark.

Positional arguments
  <image-spec>         image specification
                       (example: [<pool-name>/[<namespace>/]]<image-name>)

Optional arguments
  -p [ --pool ] arg    pool name    # 指定pool名称
  --namespace arg      namespace name
  --image arg          image name    # 指定颈项名称
  --io-size arg        IO size (in B/K/M/G/T) [default: 4K]   # 指定io大小
  --io-threads arg     ios in flight [default: 16]    # 指定并发是多少
  --io-total arg       total size for IO (in B/K/M/G/T) [default: 1G]   # 指定io总量
  --io-pattern arg     IO pattern (rand or seq) [default: seq]   # 指定是顺序写还是写
  --rw-mix-read arg    read proportion in readwrite (<= 100) [default: 50]    # 指定混合读写的比例
  --io-type arg        IO type (read , write, or readwrite(rw))    # 指定是读、写还是混合读写

测试随机写

$ rbd bench ceph-demo/rbd-demo.img --io-size 4K --io-threads 16 --io-total 200M --io-pattern rand --io-type write

随机读写

$ rbd bench ceph-demo/rbd-demo.img --io-size 4K --io-threads 16 --io-total 200M --io-pattern rand --io-type readwrite --rw-mix-read 70

# --rw-mix-read 70：表示读占70%

顺序写

$ rbd bench ceph-demo/rbd-demo.img --io-size 4K --io-threads 16 --io-total 20G --io-pattern seq --io-type write

ceph集群测试

date: 2021-04-18title: ceph集群测试 #标题tags: ceph #标签categories: 存储 # 分类