ceph集群

ceph集群创建

可使用普通账户创建ceph集群

  1. export username="ceph-admin"
  2. export passwd="ceph-admin"
  3. export node1="node1"
  4. export node2="node2"
  5. export node3="node3"
  6. export node1_ip="192.168.122.101"
  7. export node2_ip="192.168.122.102"
  8. export node3_ip="192.168.122.103"

创建部署用户和ssh免密码登录

  1. useradd ${username}
  2. echo "${passwd}" | passwd --stdin ${username}
  3. echo "${username} ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/${username}
  4. chmod 0440 /etc/sudoers.d/${username}
  5. sudo mkdir /etc/ceph
  6. sudo chown -R ceph-admin.ceph-admin /etc/ceph

安装 ceph-deploy升级pip

  1. sudo yum install -y python-pip
  2. pip install --upgrade pip
  3. pip install ceph-deploy

部署节点

创建工作目录,在部署节点时会产生很多信息

  1. mkdir my-cluster
  2. cd my-cluster
  3. ceph-deploy new $node1 $node2 $node3
  4. [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
  5. ...
  6. [node2][INFO ] Running command: /usr/sbin/ip addr show
  7. [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...

编辑 ceph.conf 配置文件添加clusterpublic网络

  1. # ls
  2. ceph.conf ceph-deploy-ceph.log ceph.mon.keyring
  3. vim ceph.conf
  4. [global]
  5. fsid = 07ef58d8-3457-4cac-aa45-95166c738c16
  6. mon_initial_members = node1, node2, node3
  7. mon_host = 192.168.122.101,192.168.122.102,192.168.122.103
  8. auth_cluster_required = cephx
  9. auth_service_required = cephx
  10. auth_client_required = cephx
  11. public network = 192.168.122.0/24
  12. cluster network = 192.168.122.0/24

安装 ceph相关软件

建议使用镜像源
替代 ceph-deploy install node1 node2,不过下面的命令需要在每台node上安装

  1. sudo wget -O /etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo
  2. sudo yum install -y ceph ceph-radosgw

配置初始 monitor(s)、并生成所有密钥

  1. ceph-deploy mon create-initial
  2. ls -l *.keyring
  3. -rw------- 1 root root 71 3 12 12:53 ceph.bootstrap-mds.keyring
  4. -rw------- 1 root root 71 3 12 12:53 ceph.bootstrap-mgr.keyring
  5. -rw------- 1 root root 71 3 12 12:53 ceph.bootstrap-osd.keyring
  6. -rw------- 1 root root 71 3 12 12:53 ceph.bootstrap-rgw.keyring
  7. -rw------- 1 root root 63 3 12 12:53 ceph.client.admin.keyring
  8. -rw------- 1 root root 73 3 12 12:50 ceph.mon.keyring

把配置信息拷贝到各节点

  1. ceph-deploy admin $node1 $node2 $node3

配置 osd

  1. for node in node{1..3};do ceph-deploy disk zap $node /dev/vdc;done
  2. for node in node{1..3};do ceph-deploy osd create $node --data /dev/vdc;done

部署 mgr

  1. ceph-deploy mgr create node{1..3}

开启 dashboard 模块,用于UI查看

  1. sudo ceph mgr module enable dashboard
  2. curl http://localhost:7000

创建 ceph 块客户端用户名和认证密钥

  1. sudo ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd'|tee ./ceph.client.rbd.keyring
  • 把密钥文件拷贝到客户端
  1. for node in node{1..3};do scp ceph.client.rbd.keyring /etc/ceph/ceph.conf $node:/etc/ceph/;done

创建pool

通常在创建pool之前,需要覆盖默认的pg_num,官方推荐:

  • 若少于5个OSD, 设置pg_num为128。
  • 5~10个OSD,设置pg_num为512。
  • 10~50个OSD,设置pg_num为4096。
  • 超过50个OSD,可以参考pgcalc计算。

PG和PGP数量一定要根据OSD的数量进行调整,计算公式如下,但是最后算出的结果一定要接近或者等于一个2的指数。

Total PGs = (Total_number_of_OSD * 100) / max_replication_count

修改ceph.conf文件

  1. [ceph-admin@v31 my-cluster]$ cat ceph.conf
  2. [global]
  3. fsid = 61b3125d-1a74-4901-997e-2cb4625367ab
  4. mon_initial_members = v31, v32, v33
  5. mon_host = 192.168.4.31,192.168.4.32,192.168.4.33
  6. auth_cluster_required = cephx
  7. auth_service_required = cephx
  8. auth_client_required = cephx
  9. osd pool default pg num = 1024
  10. osd pool default pgp num = 1024
  11. [ceph-admin@v31 my-cluster]$ ceph-deploy --overwrite-conf config push v31 v32 v33
  12. [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
  13. [ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf config push v31 v32 v33
  14. [ceph_deploy.cli][INFO ] ceph-deploy options:
  15. [ceph_deploy.cli][INFO ] username : None
  16. [ceph_deploy.cli][INFO ] verbose : False
  17. [ceph_deploy.cli][INFO ] overwrite_conf : True
  18. [ceph_deploy.cli][INFO ] subcommand : push
  19. [ceph_deploy.cli][INFO ] quiet : False
  20. [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f89fc4c9128>
  21. [ceph_deploy.cli][INFO ] cluster : ceph
  22. [ceph_deploy.cli][INFO ] client : ['v31', 'v32', 'v33']
  23. [ceph_deploy.cli][INFO ] func : <function config at 0x7f89fc6f7c08>
  24. [ceph_deploy.cli][INFO ] ceph_conf : None
  25. [ceph_deploy.cli][INFO ] default_release : False
  26. [ceph_deploy.config][DEBUG ] Pushing config to v31
  27. [v31][DEBUG ] connection detected need for sudo
  28. [v31][DEBUG ] connected to host: v31
  29. [v31][DEBUG ] detect platform information from remote host
  30. [v31][DEBUG ] detect machine type
  31. [v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
  32. [ceph_deploy.config][DEBUG ] Pushing config to v32
  33. [v32][DEBUG ] connection detected need for sudo
  34. [v32][DEBUG ] connected to host: v32
  35. [v32][DEBUG ] detect platform information from remote host
  36. [v32][DEBUG ] detect machine type
  37. [v32][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
  38. [ceph_deploy.config][DEBUG ] Pushing config to v33
  39. [v33][DEBUG ] connection detected need for sudo
  40. [v33][DEBUG ] connected to host: v33
  41. [v33][DEBUG ] detect platform information from remote host
  42. [v33][DEBUG ] detect machine type
  43. [v33][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
  • 请不要直接修改某个节点的/etc/ceph/ceph.conf文件,而是在部署机下修改ceph.conf, 采用推送的方式更加方便安全,修改完成之后,使用下面的命名将conf文件推送到各个节点上:ceph-deploy --overwrite-conf config push v31 v32 v33 此时需要修改各个节点的monitor服务:
    systemctl restart ceph-mon@{hostname}.service

例如15个OSD,副本数为3的情况下,根据公式计算的结果应该为500,最接近512,所以需要设定该pool(volumes)的pg_num和pgp_num都为512.

  1. ceph osd pool set volumes pg_num 1024
  2. ceph osd pool set volumes pgp_num 1024

ceph的pool有两种类型,一种是副本池,一种是ec池,创建时也有所区别

创建副本池

  1. ceph osd pool create testpool 128 128 pool 'testpool' created

创建ec池

设置profile

  1. [root@v31 ~]# ceph osd erasure-code-profile set EC-profile k=3 m=1 ruleset-failure-domain=osd
  2. [root@v31 ~]# ceph osd erasure-code-profile get EC-profile
  3. crush-device-class=
  4. crush-failure-domain=osd
  5. crush-root=default
  6. jerasure-per-chunk-alignment=false
  7. k=3
  8. m=1
  9. plugin=jerasure
  10. technique=reed_sol_van
  11. w=8

创建pool

  1. [root@v31 ~]# ceph osd pool create ecpool 1024 1024 erasure EC-profile
  2. For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
  3. [root@v31 ~]# ceph df
  4. GLOBAL:
  5. SIZE AVAIL RAW USED %RAW USED
  6. 8.17TiB 8.13TiB 36.3GiB 0.43
  7. POOLS:
  8. NAME ID USED %USED MAX AVAIL OBJECTS
  9. kube 14 1.55GiB 0.06 2.57TiB 612
  10. ecpool 20 0B 0 5.79TiB 0
  1. $ sudo ceph osd pool create pool-name pg_num pgp_num erasure

如:

  1. $ ceph osd pool create ecpool 12 12 erasurepool 'ecpool' created

创建mds ceph文件系统

创建 mds 服务

使用 cephFS 集群中必须有 mds 服务

  1. [ceph-admin@v31 my-cluster]$ ceph-deploy mds create v33
  2. [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
  3. [ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy mds create v33
  4. [ceph_deploy.cli][INFO ] ceph-deploy options:
  5. [ceph_deploy.cli][INFO ] username : None
  6. [ceph_deploy.cli][INFO ] verbose : False
  7. [ceph_deploy.cli][INFO ] overwrite_conf : False
  8. [ceph_deploy.cli][INFO ] subcommand : create
  9. [ceph_deploy.cli][INFO ] quiet : False
  10. [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fbc3c5e05f0>
  11. [ceph_deploy.cli][INFO ] cluster : ceph
  12. [ceph_deploy.cli][INFO ] func : <function mds at 0x7fbc3c82eed8>
  13. [ceph_deploy.cli][INFO ] ceph_conf : None
  14. [ceph_deploy.cli][INFO ] mds : [('v33', 'v33')]
  15. [ceph_deploy.cli][INFO ] default_release : False
  16. [ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts v33:v33
  17. [v33][DEBUG ] connection detected need for sudo
  18. [v33][DEBUG ] connected to host: v33
  19. [v33][DEBUG ] detect platform information from remote host
  20. [v33][DEBUG ] detect machine type
  21. [ceph_deploy.mds][INFO ] Distro info: CentOS Linux 7.6.1810 Core
  22. [ceph_deploy.mds][DEBUG ] remote host will use systemd
  23. [ceph_deploy.mds][DEBUG ] deploying mds bootstrap to v33
  24. [v33][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
  25. [v33][WARNIN] mds keyring does not exist yet, creating one
  26. [v33][DEBUG ] create a keyring file
  27. [v33][DEBUG ] create path if it doesn't exist
  28. [v33][INFO ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.v33 osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-v33/keyring
  29. [v33][INFO ] Running command: sudo systemctl enable ceph-mds@v33
  30. [v33][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/ceph-mds@v33.service to /usr/lib/systemd/system/ceph-mds@.service.
  31. [v33][INFO ] Running command: sudo systemctl start ceph-mds@v33
  32. [v33][INFO ] Running command: sudo systemctl enable ceph.target

创建pool

  1. [ceph-admin@v31 my-cluster]$ ceph osd pool create cluster_data_metadata 1024 1024
  2. For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
  3. [ceph-admin@v31 my-cluster]$ ceph osd pool create cluster_data 1024 1024
  4. For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
  5. [ceph-admin@v31 my-cluster]$ ceph fs new cephfs cluster_data_metadata cluster_data
  6. new fs with metadata pool 11 and data pool 12
  7. [ceph-admin@v31 my-cluster]$ ceph df
  8. GLOBAL:
  9. SIZE AVAIL RAW USED %RAW USED
  10. 8.17TiB 8.14TiB 30.7GiB 0.37
  11. POOLS:
  12. NAME ID USED %USED MAX AVAIL OBJECTS
  13. cluster_data_metadata 11 0B 0 2.58TiB 0
  14. cluster_data 12 0B 0 2.58TiB 0
  15. [ceph-admin@v31 my-cluster]$ ceph mds stat
  16. cephfs-0/0/1 up
  17. [ceph-admin@v31 my-cluster]$ ceph osd pool ls
  18. cluster_data_metadata
  19. cluster_data
  20. [ceph-admin@v31 my-cluster]$ ceph fs ls
  21. name: cephfs, metadata pool: cluster_data_metadata, data pools: [cluster_data ]
  1. [ceph-admin@v31 ~]$ ceph osd pool create cluster_data_metadata 1024 1024 replicated_rule 1
  2. pool 'cluster_data_metadata' created
  3. [ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 1024 1024 replicated_rule 1
  4. Error ERANGE: pg_num 1024 size 3 would mean 9216 total pgs, which exceeds max 7500 (mon_max_pg_per_osd 250 * num_in_osds 30)
  5. [ceph-admin@v31 ~]$ ceph df
  6. GLOBAL:
  7. SIZE AVAIL RAW USED %RAW USED
  8. 8.17TiB 8.13TiB 36.4GiB 0.43
  9. POOLS:
  10. NAME ID USED %USED MAX AVAIL OBJECTS
  11. kube 14 1.57GiB 0.06 2.57TiB 614
  12. cluster_data_metadata 21 0B 0 2.57TiB 0
  13. [ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 1024 replicated_rule 1
  14. Error ERANGE: pg_num 1024 size 3 would mean 9216 total pgs, which exceeds max 7500 (mon_max_pg_per_osd 250 * num_in_osds 30)
  15. [ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 100 replicated_rule 1
  16. pool 'cluster_data_data' created
  17. [ceph-admin@v31 ~]$ ceph df
  18. GLOBAL:
  19. SIZE AVAIL RAW USED %RAW USED
  20. 8.17TiB 8.13TiB 36.4GiB 0.43
  21. POOLS:
  22. NAME ID USED %USED MAX AVAIL OBJECTS
  23. kube 14 1.57GiB 0.06 2.57TiB 614
  24. cluster_data_metadata 21 0B 0 2.57TiB 0
  25. cluster_data_data 22 0B 0 2.57TiB 0

创建osd存存储池

  1. ceph osd pool create rbd 50
  2. ceph osd pool create kube 50
  3. # 开启监控
  4. ceph osd pool application enable kube mon
  5. ceph osd pool application enable rbd mon

创建用户(可选)

  1. ceph auth get-or-create client.cephfs mon 'allow r' mds 'allow r, allow rw path=/' osd 'allow rw pool=cephfs_data' -o ceph.client.cephfs.keyring
  2. scp ceph.client.cephfs.keyring <node>:/etc/ceph/

对应ceph服务器上获取client-key

  1. ceph auth get-key client.cephfs

这里可以直接使用admin账户的keyring

  1. cat ceph.client.admin.keyring
  2. [client.admin]
  3. key = AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==

通过内核驱动挂载 Ceph FS

安装 ceph-fuse

  1. yum install ceph-fuse -y

确认kernel 加载 ceph 模块

  1. lsmod | grep ceph
  2. ceph 358802 0
  3. libceph 306625 1 ceph
  4. dns_resolver 13140 2 nfsv4,libceph
  5. libcrc32c 12644 4 ip_vs,libceph,nf_nat,nf_conntrack

创建挂载目录

  1. mkdir -p /data

挂载

  1. [ceph-admin@v31 my-cluster]$ sudo mount -t ceph v31:6789:/ /data -o name=admin,secret=AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==
  2. [ceph-admin@v31 my-cluster]$ df -Th |grep ceph
  3. 192.168.4.31:6789:/ ceph 2.6T 0 2.6T 0% /data

写入/etc/fstab

  1. [ceph-admin@v31 my-cluster]$ cd /etc/ceph/
  2. [ceph-admin@v31 ceph]$ cp ceph.client.admin.keyring cephfs.key
  3. [ceph-admin@v31 ceph]$ vim cephfs.key
  4. AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==
  5. echo "v31:6789:/ /data ceph name=admin,secretfile=/etc/ceph/cephfs.key,noatime,_netdev 0 0 " >>/etc/fstab

CephFS性能测试

fio

随机读测试

  1. [root@v31 ~]# fio -filename=/mnt/data/test1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=10G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
  2. mytest: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
  3. ...
  4. fio-3.1
  5. Starting 10 threads
  6. mytest: Laying out IO file (1 file / 10240MiB)
  7. Jobs: 8 (f=8): [r(1),_(1),r(2),_(1),r(5)][99.8%][r=160MiB/s,w=0KiB/s][r=10.2k,w=0 IOPS][eta 00m:02s]
  8. mytest: (groupid=0, jobs=10): err= 0: pid=3824106: Tue Mar 26 09:13:04 2019
  9. read: IOPS=7359, BW=115MiB/s (121MB/s)(100GiB/890546msec)
  10. clat (usec): min=155, max=215229, avg=1355.08, stdev=1870.48
  11. lat (usec): min=155, max=215229, avg=1355.40, stdev=1870.48
  12. clat percentiles (usec):
  13. | 1.00th=[ 200], 5.00th=[ 217], 10.00th=[ 231], 20.00th=[ 265],
  14. | 30.00th=[ 486], 40.00th=[ 578], 50.00th=[ 660], 60.00th=[ 799],
  15. | 70.00th=[ 1037], 80.00th=[ 1893], 90.00th=[ 3982], 95.00th=[ 5080],
  16. | 99.00th=[ 7701], 99.50th=[ 9110], 99.90th=[15664], 99.95th=[19530],
  17. | 99.99th=[28705]
  18. bw ( KiB/s): min= 3040, max=28610, per=10.01%, avg=11782.72, stdev=3610.33, samples=17792
  19. iops : min= 190, max= 1788, avg=736.38, stdev=225.63, samples=17792
  20. lat (usec) : 250=16.50%, 500=14.76%, 750=25.96%, 1000=11.70%
  21. lat (msec) : 2=11.60%, 4=9.52%, 10=9.62%, 20=0.30%, 50=0.04%
  22. lat (msec) : 100=0.01%, 250=0.01%
  23. cpu : usr=0.39%, sys=1.82%, ctx=6694389, majf=0, minf=5367
  24. IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  25. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  26. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  27. issued rwt: total=6553600,0,0, short=0,0,0, dropped=0,0,0
  28. latency : target=0, window=0, percentile=100.00%, depth=1
  29. Run status group 0 (all jobs):
  30. READ: bw=115MiB/s (121MB/s), 115MiB/s-115MiB/s (121MB/s-121MB/s), io=100GiB (107GB), run=890546-890546msec

顺序读测试

  1. [root@v33 ~]# fio -filename=/mnt/data/test2 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest
  2. mytest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
  3. ...
  4. fio-3.1
  5. Starting 30 threads
  6. mytest: Laying out IO file (1 file / 10240MiB)
  7. Jobs: 30 (f=30): [R(30)][100.0%][r=138MiB/s,w=0KiB/s][r=8812,w=0 IOPS][eta 00m:00s]
  8. mytest: (groupid=0, jobs=30): err= 0: pid=411789: Tue Mar 26 09:33:03 2019
  9. read: IOPS=10.0k, BW=156MiB/s (164MB/s)(153GiB/1000005msec)
  10. clat (usec): min=141, max=38416, avg=2992.85, stdev=2478.50
  11. lat (usec): min=141, max=38416, avg=2993.14, stdev=2478.52
  12. clat percentiles (usec):
  13. | 1.00th=[ 161], 5.00th=[ 174], 10.00th=[ 188], 20.00th=[ 260],
  14. | 30.00th=[ 652], 40.00th=[ 1467], 50.00th=[ 2999], 60.00th=[ 3949],
  15. | 70.00th=[ 4490], 80.00th=[ 5342], 90.00th=[ 6325], 95.00th=[ 7111],
  16. | 99.00th=[ 8848], 99.50th=[ 9503], 99.90th=[10814], 99.95th=[11731],
  17. | 99.99th=[18482]
  18. bw ( KiB/s): min= 1472, max=47743, per=3.34%, avg=5349.53, stdev=4848.75, samples=60000
  19. iops : min= 92, max= 2983, avg=334.11, stdev=303.03, samples=60000
  20. lat (usec) : 250=19.25%, 500=7.26%, 750=5.25%, 1000=3.04%
  21. lat (msec) : 2=9.00%, 4=17.07%, 10=38.87%, 20=0.26%, 50=0.01%
  22. cpu : usr=0.17%, sys=1.04%, ctx=14529895, majf=0, minf=3600
  23. IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  24. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  25. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  26. issued rwt: total=10015571,0,0, short=0,0,0, dropped=0,0,0
  27. latency : target=0, window=0, percentile=100.00%, depth=1
  28. Run status group 0 (all jobs):
  29. READ: bw=156MiB/s (164MB/s), 156MiB/s-156MiB/s (164MB/s-164MB/s), io=153GiB (164GB), run=1000005-1000005msec

随机写测试

  1. [root@v31 ~]# fio -filename=/mnt/data/test3 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest_4k_10G_randwrite
  2. mytest_4k_10G_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
  3. ...
  4. fio-3.1
  5. Starting 30 threads
  6. mytest_4k_10G_randwrite: Laying out IO file (1 file / 10240MiB)
  7. Jobs: 30 (f=30): [w(30)][100.0%][r=0KiB/s,w=11.8MiB/s][r=0,w=3009 IOPS][eta 00m:00s]
  8. mytest_4k_10G_randwrite: (groupid=0, jobs=30): err= 0: pid=3852817: Tue Mar 26 09:59:25 2019
  9. write: IOPS=3107, BW=12.1MiB/s (12.7MB/s)(11.9GiB/1000067msec)
  10. clat (usec): min=922, max=230751, avg=9651.32, stdev=16589.93
  11. lat (usec): min=923, max=230751, avg=9651.74, stdev=16589.93
  12. clat percentiles (usec):
  13. | 1.00th=[ 1188], 5.00th=[ 1319], 10.00th=[ 1418], 20.00th=[ 1565],
  14. | 30.00th=[ 1745], 40.00th=[ 1991], 50.00th=[ 2343], 60.00th=[ 3097],
  15. | 70.00th=[ 6325], 80.00th=[ 11994], 90.00th=[ 30278], 95.00th=[ 46924],
  16. | 99.00th=[ 79168], 99.50th=[ 91751], 99.90th=[121111], 99.95th=[130548],
  17. | 99.99th=[158335]
  18. bw ( KiB/s): min= 112, max= 1162, per=3.34%, avg=414.50, stdev=92.93, samples=60000
  19. iops : min= 28, max= 290, avg=103.60, stdev=23.22, samples=60000
  20. lat (usec) : 1000=0.01%
  21. lat (msec) : 2=40.50%, 4=23.80%, 10=13.13%, 20=7.97%, 50=10.27%
  22. lat (msec) : 100=4.00%, 250=0.32%
  23. cpu : usr=0.06%, sys=0.30%, ctx=3110768, majf=0, minf=141484
  24. IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  25. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  26. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  27. issued rwt: total=0,3107281,0, short=0,0,0, dropped=0,0,0
  28. latency : target=0, window=0, percentile=100.00%, depth=1
  29. Run status group 0 (all jobs):
  30. WRITE: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s), io=11.9GiB (12.7GB), run=1000067-1000067msec

顺序写测试

  1. [root@v33 ~]# fio -filename=/mnt/data/test4 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest
  2. mytest: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
  3. ...
  4. fio-3.1
  5. Starting 30 threads
  6. mytest: Laying out IO file (1 file / 10240MiB)
  7. Jobs: 30 (f=30): [W(30)][100.0%][r=0KiB/s,w=50.3MiB/s][r=0,w=3219 IOPS][eta 00m:00s]
  8. mytest: (groupid=0, jobs=30): err= 0: pid=454215: Tue Mar 26 10:19:27 2019
  9. write: IOPS=3322, BW=51.9MiB/s (54.4MB/s)(50.7GiB/1000007msec)
  10. clat (usec): min=1130, max=121544, avg=9026.88, stdev=2132.29
  11. lat (usec): min=1131, max=121545, avg=9027.49, stdev=2132.30
  12. clat percentiles (usec):
  13. | 1.00th=[ 4047], 5.00th=[ 6325], 10.00th=[ 7308], 20.00th=[ 7963],
  14. | 30.00th=[ 8291], 40.00th=[ 8586], 50.00th=[ 8848], 60.00th=[ 9110],
  15. | 70.00th=[ 9503], 80.00th=[10028], 90.00th=[10814], 95.00th=[11600],
  16. | 99.00th=[17171], 99.50th=[20317], 99.90th=[25035], 99.95th=[26608],
  17. | 99.99th=[44303]
  18. bw ( KiB/s): min= 896, max= 3712, per=3.34%, avg=1772.81, stdev=213.20, samples=60000
  19. iops : min= 56, max= 232, avg=110.76, stdev=13.32, samples=60000
  20. lat (msec) : 2=0.08%, 4=0.88%, 10=79.28%, 20=19.23%, 50=0.53%
  21. lat (msec) : 100=0.01%, 250=0.01%
  22. cpu : usr=0.06%, sys=0.55%, ctx=3581559, majf=0, minf=4243
  23. IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  24. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  25. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  26. issued rwt: total=0,3322270,0, short=0,0,0, dropped=0,0,0
  27. latency : target=0, window=0, percentile=100.00%, depth=1
  28. Run status group 0 (all jobs):
  29. WRITE: bw=51.9MiB/s (54.4MB/s), 51.9MiB/s-51.9MiB/s (54.4MB/s-54.4MB/s), io=50.7GiB (54.4GB), run=1000007-1000007msec

混合随机读写

  1. fio -filename=/data/test5 -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=100 -group_reporting -name=mytest -ioscheduler=noop

同步i/o(顺序写)测试

  1. [root@v31 data]# fio -filename=/mnt/data/test6 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=4k -size=50G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
  2. mytest: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
  3. ...
  4. fio-3.1
  5. Starting 10 threads
  6. mytest: Laying out IO file (1 file / 51200MiB)
  7. Jobs: 10 (f=10): [W(10)][100.0%][r=0KiB/s,w=25.6MiB/s][r=0,w=6549 IOPS][eta 00m:00s]
  8. mytest: (groupid=0, jobs=10): err= 0: pid=3883680: Tue Mar 26 10:48:08 2019
  9. write: IOPS=6180, BW=24.1MiB/s (25.3MB/s)(23.6GiB/1000001msec)
  10. clat (usec): min=825, max=176948, avg=1615.44, stdev=989.83
  11. lat (usec): min=826, max=176949, avg=1615.81, stdev=989.83
  12. clat percentiles (usec):
  13. | 1.00th=[ 1020], 5.00th=[ 1106], 10.00th=[ 1188], 20.00th=[ 1303],
  14. | 30.00th=[ 1369], 40.00th=[ 1434], 50.00th=[ 1500], 60.00th=[ 1565],
  15. | 70.00th=[ 1647], 80.00th=[ 1778], 90.00th=[ 2024], 95.00th=[ 2245],
  16. | 99.00th=[ 2933], 99.50th=[ 4817], 99.90th=[18744], 99.95th=[19268],
  17. | 99.99th=[21890]
  18. bw ( KiB/s): min= 1280, max= 3920, per=10.00%, avg=2473.24, stdev=365.21, samples=19998
  19. iops : min= 320, max= 980, avg=618.27, stdev=91.30, samples=19998
  20. lat (usec) : 1000=0.63%
  21. lat (msec) : 2=88.90%, 4=9.90%, 10=0.26%, 20=0.30%, 50=0.01%
  22. lat (msec) : 100=0.01%, 250=0.01%
  23. cpu : usr=0.27%, sys=1.59%, ctx=6286666, majf=0, minf=1148
  24. IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  25. submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  26. complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  27. issued rwt: total=0,6180315,0, short=0,0,0, dropped=0,0,0
  28. latency : target=0, window=0, percentile=100.00%, depth=1
  29. Run status group 0 (all jobs):
  30. WRITE: bw=24.1MiB/s (25.3MB/s), 24.1MiB/s-24.1MiB/s (25.3MB/s-25.3MB/s), io=23.6GiB (25.3GB), run=1000001-1000001msec

异步i/o(顺序写)测试

  1. fio -filename=/data/test7 -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=4k -size=50G -numjobs=10 -runtime=1000 -group_reporting -name=mytest

磁盘性能测试

为了对比Ceph文件的性能,此处做了一个单块磁盘的性能测试,为了确保测试的真实性,单块磁盘就选择为一个OSD对应的磁盘。

随机读测试-单块硬盘
  1. fio -filename=/var/lib/ceph/osd/ceph-4/disktest/dlw1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=10G -numjobs=10 -runtime=1000 -group_reporting -name=mytest

rados性能测试

4M写入测试

  1. rados bench -p cluster_data_data 60 write -t 32 --no-cleanup
  2. Total time run: 60.717291
  3. Total writes made: 2238
  4. Write size: 4194304
  5. Object size: 4194304
  6. Bandwidth (MB/sec): 147.437
  7. Stddev Bandwidth: 20.1603
  8. Max bandwidth (MB/sec): 168
  9. Min bandwidth (MB/sec): 48
  10. Average IOPS: 36
  11. Stddev IOPS: 5
  12. Max IOPS: 42
  13. Min IOPS: 12
  14. Average Latency(s): 0.865663
  15. Stddev Latency(s): 0.40126
  16. Max latency(s): 3.58639
  17. Min latency(s): 0.185036

4k写入测试

  1. rados bench -p cluster_data_data 60 write -t 32 -b 4096 --no-cleanup
  2. Total time run: 60.035923
  3. Total writes made: 201042
  4. Write size: 4096
  5. Object size: 4096
  6. Bandwidth (MB/sec): 13.0808
  7. Stddev Bandwidth: 1.10742
  8. Max bandwidth (MB/sec): 17.1133
  9. Min bandwidth (MB/sec): 9.71875
  10. Average IOPS: 3348
  11. Stddev IOPS: 283
  12. Max IOPS: 4381
  13. Min IOPS: 2488
  14. Average Latency(s): 0.00955468
  15. Stddev Latency(s): 0.0164307
  16. Max latency(s): 0.335681
  17. Min latency(s): 0.00105769

4M顺序读

  1. rados bench -p cluster_data_data 60 seq -t 32 --no-cleanup
  2. Total time run: 22.129977
  3. Total reads made: 201042
  4. Read size: 4096
  5. Object size: 4096
  6. Bandwidth (MB/sec): 35.4867
  7. Average IOPS: 9084
  8. Stddev IOPS: 1278
  9. Max IOPS: 14011
  10. Min IOPS: 7578
  11. Average Latency(s): 0.0035112
  12. Max latency(s): 0.181241
  13. Min latency(s): 0.000287577

删除CephFS

  1. [root@v32 ~]# ceph df
  2. GLOBAL:
  3. SIZE AVAIL RAW USED %RAW USED
  4. 8.17TiB 7.91TiB 262GiB 3.13
  5. POOLS:
  6. NAME ID USED %USED MAX AVAIL OBJECTS
  7. cluster_data_metadata 11 231MiB 0 2.49TiB 51638
  8. cluster_data 12 65.2GiB 2.49 2.49TiB 317473
  9. [root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
  10. Error EBUSY: pool 'cluster_data_metadata' is in use by CephFS
  11. [root@v32 ~]# ceph fs ls
  12. name: cephfs, metadata pool: cluster_data_metadata, data pools: [cluster_data ]
  13. [root@v32 ~]# ceph fs rm cephfs --yes-i-really-mean-it
  14. Error EINVAL: all MDS daemons must be inactive before removing filesystem
  15. [root@v33 ~]# systemctl stop ceph-mds@v33.service
  16. [root@v33 ~]# systemctl desable ceph-mds@v33.service
  17. Unknown operation 'desable'.
  18. [root@v33 ~]# systemctl disable ceph-mds@v33.service
  19. Removed symlink /etc/systemd/system/ceph-mds.target.wants/ceph-mds@v33.service.
  20. [root@v32 ~]# ceph fs rm cephfs --yes-i-really-mean-it
  21. [root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
  22. Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
  23. [root@v32 ~]# cat /etc/ceph/ceph.conf
  24. [global]
  25. ...
  26. [mon]
  27. mon allow pool delete = true
  28. [root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
  29. pool 'cluster_data_metadata' removed
  30. [root@v32 ~]# ceph osd pool delete cluster_data cluster_data --yes-i-really-really-mean-it
  31. pool 'cluster_data' removed

CRUSH map

  1. 1、提取已有的CRUSH map ,使用-o参数,ceph将输出一个经过编译的CRUSH map 到您指定的文件
  2. ` ceph osd getcrushmap -o crushmap.txt`
  3. 2、反编译你的CRUSH map ,使用-d参数将反编译CRUSH map 到通过-o 指定的文件中
  4. `crushtool -d crushmap.txt -o crushmap-decompile`
  5. 3、使用编辑器编辑CRUSH map
  6. `vi crushmap-decompile`
  7. 4、重新编译这个新的CRUSH map
  8. `crushtool -c crushmap-decompile -o crushmap-compiled`
  9. 5、将新的CRUSH map 应用到ceph 集群中
  10. `ceph osd setcrushmap -i crushmap-compiled`

参考https://blog.csdn.net/heivy/article/details/50592244

查看pool

列出所有的poll

  1. [ceph-admin@v31 my-cluster]$ ceph df
  2. GLOBAL:
  3. SIZE AVAIL RAW USED %RAW USED
  4. 8.17TiB 8.14TiB 30.5GiB 0.37
  5. POOLS:
  6. NAME ID USED %USED MAX AVAIL OBJECTS
  7. cluster_data_metadata 2 0B 0 2.58TiB 0
  8. [ceph-admin@v31 my-cluster]$ rados lspools
  9. cluster_data_metadata

删除cluster_data_metadata pool

查看pool的详细配置信息

  1. [ceph-admin@v31 my-cluster]$ ceph osd pool ls detail
  2. pool 2 'cluster_data_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 146 flags hashpspool stripe_width 0
  1. [ceph-admin@v31 my-cluster]$ ceph osd dump|grep pool
  2. pool 2 'cluster_data_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 146 flags hashpspool stripe_width 0

查看每个pool的空间使用及IO情况

  1. [root@v32 ~]# rados df
  2. POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
  3. kube 36B 4 0 12 0 0 0 5538 34.1MiB 142769 10.4GiB
  4. total_objects 4
  5. total_used 31.8GiB
  6. total_avail 8.14TiB
  7. total_space 8.17TiB

获取pool参数

查看osd分布

  1. ceph osd tree
  2. ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
  3. -1 8.16879 root default
  4. -3 2.72293 host v31
  5. 0 hdd 0.27229 osd.0 up 1.00000 1.00000
  6. 1 hdd 0.27229 osd.1 up 1.00000 1.00000
  7. 2 hdd 0.27229 osd.2 up 1.00000 1.00000
  8. 3 hdd 0.27229 osd.3 up 1.00000 1.00000
  9. 4 hdd 0.27229 osd.4 up 1.00000 1.00000
  10. 5 hdd 0.27229 osd.5 up 1.00000 1.00000
  11. 6 hdd 0.27229 osd.6 up 1.00000 1.00000
  12. 7 hdd 0.27229 osd.7 up 1.00000 1.00000
  13. 24 hdd 0.27229 osd.24 up 1.00000 1.00000
  14. 25 hdd 0.27229 osd.25 up 1.00000 1.00000
  15. -5 2.72293 host v32
  16. 8 hdd 0.27229 osd.8 up 1.00000 1.00000
  17. 9 hdd 0.27229 osd.9 up 1.00000 1.00000
  18. 10 hdd 0.27229 osd.10 up 1.00000 1.00000
  19. 11 hdd 0.27229 osd.11 up 1.00000 1.00000
  20. 12 hdd 0.27229 osd.12 up 1.00000 1.00000
  21. 13 hdd 0.27229 osd.13 up 1.00000 1.00000
  22. 14 hdd 0.27229 osd.14 up 1.00000 1.00000
  23. 15 hdd 0.27229 osd.15 up 1.00000 1.00000
  24. 27 hdd 0.27229 osd.27 up 1.00000 1.00000
  25. 29 hdd 0.27229 osd.29 up 1.00000 1.00000
  26. -7 2.72293 host v33
  27. 16 hdd 0.27229 osd.16 up 1.00000 1.00000
  28. 17 hdd 0.27229 osd.17 up 1.00000 1.00000
  29. 18 hdd 0.27229 osd.18 up 1.00000 1.00000
  30. 19 hdd 0.27229 osd.19 up 1.00000 1.00000
  31. 20 hdd 0.27229 osd.20 up 1.00000 1.00000
  32. 21 hdd 0.27229 osd.21 up 1.00000 1.00000
  33. 22 hdd 0.27229 osd.22 up 1.00000 1.00000
  34. 23 hdd 0.27229 osd.23 up 1.00000 1.00000
  35. 26 hdd 0.27229 osd.26 up 1.00000 1.00000
  36. 28 hdd 0.27229 osd.28 up 1.00000 1.00000

删除poll

sudo ceph osd pool delete {pool-name} {pool-name} --yes-i-really-really-mean-it

sudo ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it

如果删除pool时提示error请参考: 删除pool error的解决方法

集群添加ODS

  1. [ceph-admin@v31 my-cluster]$ ceph -s
  2. cluster:
  3. id: ffdda80f-a48a-431a-a71b-525e5f1965d9
  4. health: HEALTH_OK
  5. services:
  6. mon: 3 daemons, quorum v31,v32,v33
  7. mgr: v31(active), standbys: v32, v33
  8. osd: 24 osds: 24 up, 24 in
  9. data:
  10. pools: 0 pools, 0 pgs
  11. objects: 0 objects, 0B
  12. usage: 24.3GiB used, 6.51TiB / 6.54TiB avail
  13. pgs:
  • 补充知识:osd状态
  1. up:守护进程运行中,能够提供IO服务;
  2. down:守护进程不在运行,无法提供IO服务;
  3. in:包含数据;
  4. out:不包含数据

列出所有磁盘

  1. [root@v33 ~]# sudo ceph-disk list
  2. /dev/dm-0 other, ext4, mounted on /
  3. /dev/dm-1 other, swap
  4. /dev/dm-2 other, unknown
  5. /dev/dm-3 other, unknown
  6. /dev/dm-4 other, unknown
  7. /dev/dm-5 other, unknown
  8. /dev/dm-6 other, unknown
  9. /dev/dm-7 other, unknown
  10. /dev/dm-8 other, unknown
  11. /dev/dm-9 other, unknown
  12. /dev/sda :
  13. /dev/sda1 other, vfat, mounted on /boot/efi
  14. /dev/sda2 other, xfs, mounted on /boot
  15. /dev/sda3 other, LVM2_member
  16. /dev/sdb other, unknown
  17. /dev/sdc other, unknown
  18. /dev/sdd other, LVM2_member
  19. /dev/sde other, LVM2_member
  20. /dev/sdf other, LVM2_member
  21. /dev/sdg other, LVM2_member
  22. /dev/sdh other, LVM2_member
  23. /dev/sdi other, LVM2_member
  24. /dev/sdj other, LVM2_member
  25. /dev/sdk other, LVM2_member

添加时报错

  1. [ceph-admin@v31 my-cluster]$ ceph-deploy osd create v31 --data /dev/sdb
  2. [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
  3. [ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy osd create v31 --data /dev/sdb
  4. [ceph_deploy.cli][INFO ] ceph-deploy options:
  5. [ceph_deploy.cli][INFO ] verbose : False
  6. [ceph_deploy.cli][INFO ] bluestore : None
  7. [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fe69d002830>
  8. [ceph_deploy.cli][INFO ] cluster : ceph
  9. [ceph_deploy.cli][INFO ] fs_type : xfs
  10. [ceph_deploy.cli][INFO ] block_wal : None
  11. [ceph_deploy.cli][INFO ] default_release : False
  12. [ceph_deploy.cli][INFO ] username : None
  13. [ceph_deploy.cli][INFO ] journal : None
  14. [ceph_deploy.cli][INFO ] subcommand : create
  15. [ceph_deploy.cli][INFO ] host : v31
  16. [ceph_deploy.cli][INFO ] filestore : None
  17. [ceph_deploy.cli][INFO ] func : <function osd at 0x7fe69d2478c0>
  18. [ceph_deploy.cli][INFO ] ceph_conf : None
  19. [ceph_deploy.cli][INFO ] zap_disk : False
  20. [ceph_deploy.cli][INFO ] data : /dev/sdb
  21. [ceph_deploy.cli][INFO ] block_db : None
  22. [ceph_deploy.cli][INFO ] dmcrypt : False
  23. [ceph_deploy.cli][INFO ] overwrite_conf : False
  24. [ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
  25. [ceph_deploy.cli][INFO ] quiet : False
  26. [ceph_deploy.cli][INFO ] debug : False
  27. [ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
  28. [v31][DEBUG ] connection detected need for sudo
  29. [v31][DEBUG ] connected to host: v31
  30. [v31][DEBUG ] detect platform information from remote host
  31. [v31][DEBUG ] detect machine type
  32. [v31][DEBUG ] find the location of an executable
  33. [ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core
  34. [ceph_deploy.osd][DEBUG ] Deploying osd to v31
  35. [v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
  36. [ceph_deploy.osd][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
  37. [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
  38. [ceph-admin@v31 my-cluster]$ ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb
  39. [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
  40. [ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb
  41. [ceph_deploy.cli][INFO ] ceph-deploy options:
  42. [ceph_deploy.cli][INFO ] verbose : False
  43. [ceph_deploy.cli][INFO ] bluestore : None
  44. [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7ff6abd72830>
  45. [ceph_deploy.cli][INFO ] cluster : ceph
  46. [ceph_deploy.cli][INFO ] fs_type : xfs
  47. [ceph_deploy.cli][INFO ] block_wal : None
  48. [ceph_deploy.cli][INFO ] default_release : False
  49. [ceph_deploy.cli][INFO ] username : None
  50. [ceph_deploy.cli][INFO ] journal : None
  51. [ceph_deploy.cli][INFO ] subcommand : create
  52. [ceph_deploy.cli][INFO ] host : v31
  53. [ceph_deploy.cli][INFO ] filestore : None
  54. [ceph_deploy.cli][INFO ] func : <function osd at 0x7ff6abfb78c0>
  55. [ceph_deploy.cli][INFO ] ceph_conf : None
  56. [ceph_deploy.cli][INFO ] zap_disk : False
  57. [ceph_deploy.cli][INFO ] data : /dev/sdb
  58. [ceph_deploy.cli][INFO ] block_db : None
  59. [ceph_deploy.cli][INFO ] dmcrypt : False
  60. [ceph_deploy.cli][INFO ] overwrite_conf : True
  61. [ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
  62. [ceph_deploy.cli][INFO ] quiet : False
  63. [ceph_deploy.cli][INFO ] debug : False
  64. [ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
  65. [v31][DEBUG ] connection detected need for sudo
  66. [v31][DEBUG ] connected to host: v31
  67. [v31][DEBUG ] detect platform information from remote host
  68. [v31][DEBUG ] detect machine type
  69. [v31][DEBUG ] find the location of an executable
  70. [ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core
  71. [ceph_deploy.osd][DEBUG ] Deploying osd to v31
  72. [v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
  73. [v31][DEBUG ] find the location of an executable
  74. [v31][INFO ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
  75. [v31][WARNIN] usage: ceph-volume lvm create [-h] --data DATA [--filestore]
  76. [v31][WARNIN] [--journal JOURNAL] [--bluestore]
  77. [v31][WARNIN] [--block.db BLOCK_DB] [--block.wal BLOCK_WAL]
  78. [v31][WARNIN] [--osd-id OSD_ID] [--osd-fsid OSD_FSID]
  79. [v31][WARNIN] [--cluster-fsid CLUSTER_FSID]
  80. [v31][WARNIN] [--crush-device-class CRUSH_DEVICE_CLASS]
  81. [v31][WARNIN] [--dmcrypt] [--no-systemd]
  82. [v31][WARNIN] ceph-volume lvm create: error: GPT headers found, they must be removed on: /dev/sdb
  83. [v31][ERROR ] RuntimeError: command returned non-zero exit status: 2
  84. [ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
  85. [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
  86. [ceph-admin@v31 my-cluster]$ /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
  87. --> Falling back to /tmp/ for logging. Can't use /var/log/ceph/ceph-volume.log
  88. --> [Errno 13] Permission denied: '/var/log/ceph/ceph-volume.log'
  89. stderr: error: /dev/sdb: Permission denied
  90. --> SuperUserError: This command needs to be executed with sudo or as root
  91. [ceph-admin@v31 my-cluster]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
  92. usage: ceph-volume lvm create [-h] --data DATA [--filestore]
  93. [--journal JOURNAL] [--bluestore]
  94. [--block.db BLOCK_DB] [--block.wal BLOCK_WAL]
  95. [--osd-id OSD_ID] [--osd-fsid OSD_FSID]
  96. [--cluster-fsid CLUSTER_FSID]
  97. [--crush-device-class CRUSH_DEVICE_CLASS]
  98. [--dmcrypt] [--no-systemd]
  99. ceph-volume lvm create: error: GPT headers found, they must be removed on: /dev/sdb

转换为mbr

  1. [ceph-admin@v31 my-cluster]$ sudo parted -s /dev/sdb mklabel msdos

再次格式化

  1. [root@v32 ~]# /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdc
  2. Running command: /bin/ceph-authtool --gen-print-key
  3. Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new d47d7861-9a83-4879-847d-693e3aa794b6
  4. Running command: vgcreate --force --yes ceph-fad7bf25-dd60-4eff-a932-970c376af00b /dev/sdc
  5. stdout: Wiping dos signature on /dev/sdc.
  6. stdout: Physical volume "/dev/sdc" successfully created.
  7. stdout: Volume group "ceph-fad7bf25-dd60-4eff-a932-970c376af00b" successfully created
  8. Running command: lvcreate --yes -l 100%FREE -n osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 ceph-fad7bf25-dd60-4eff-a932-970c376af00b
  9. stdout: Logical volume "osd-block-d47d7861-9a83-4879-847d-693e3aa794b6" created.
  10. Running command: /bin/ceph-authtool --gen-print-key
  11. Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-27
  12. Running command: restorecon /var/lib/ceph/osd/ceph-27
  13. Running command: chown -h ceph:ceph /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6
  14. Running command: chown -R ceph:ceph /dev/dm-10
  15. Running command: ln -s /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 /var/lib/ceph/osd/ceph-27/block
  16. Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-27/activate.monmap
  17. stderr: got monmap epoch 2
  18. Running command: ceph-authtool /var/lib/ceph/osd/ceph-27/keyring --create-keyring --name osd.27 --add-key AQB4yohcMW8eLhAAkdQmhIavIcF+FcPjkKooSQ==
  19. stdout: creating /var/lib/ceph/osd/ceph-27/keyring
  20. stdout: added entity osd.27 auth auth(auid = 18446744073709551615 key=AQB4yohcMW8eLhAAkdQmhIavIcF+FcPjkKooSQ== with 0 caps)
  21. Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27/keyring
  22. Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27/
  23. Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 27 --monmap /var/lib/ceph/osd/ceph-27/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-27/ --osd-uuid d47d7861-9a83-4879-847d-693e3aa794b6 --setuser ceph --setgroup ceph
  24. --> ceph-volume lvm prepare successful for: /dev/sdc
  25. Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27
  26. Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 --path /var/lib/ceph/osd/ceph-27
  27. Running command: ln -snf /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 /var/lib/ceph/osd/ceph-27/block
  28. Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-27/block
  29. Running command: chown -R ceph:ceph /dev/dm-10
  30. Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27
  31. Running command: systemctl enable ceph-volume@lvm-27-d47d7861-9a83-4879-847d-693e3aa794b6
  32. stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-27-d47d7861-9a83-4879-847d-693e3aa794b6.service to /usr/lib/systemd/system/ceph-volume@.service.
  33. Running command: systemctl enable --runtime ceph-osd@27
  34. stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@27.service to /usr/lib/systemd/system/ceph-osd@.service.
  35. Running command: systemctl start ceph-osd@27
  36. --> ceph-volume lvm activate successful for osd ID: 27
  37. --> ceph-volume lvm create successful for: /dev/sdc
  38. [root@v32 ~]# /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdd
  39. Running command: /bin/ceph-authtool --gen-print-key
  40. Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new fade79d7-8bee-49c6-85f8-d6c141e6bd4e
  41. Running command: vgcreate --force --yes ceph-fc851010-f2c6-43f7-9c12-843d3a023a65 /dev/sdd
  42. stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-00af8489-3599-427d-bae6-de1e61c4c38a'
  43. Unable to add physical volume '/dev/sdd' to volume group 'ceph-00af8489-3599-427d-bae6-de1e61c4c38a'
  44. /dev/sdd: physical volume not initialized.
  45. --> Was unable to complete a new OSD, will rollback changes
  46. --> OSD will be fully purged from the cluster, because the ID was generated
  47. Running command: ceph osd purge osd.29 --yes-i-really-mean-it
  48. stderr: purged osd.29
  49. --> RuntimeError: command returned non-zero exit status: 5
  50. ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb

rbd数据查看

  1. [root@v32 ~]# rados ls -p kube
  2. rbd_id.kubernetes-dynamic-pvc-28e6ad1e-4675-11e9-8901-6602f4085af9
  3. rbd_directory
  4. rbd_info
  5. rbd_header.149046b8b4567

删除rbd

  1. [root@v32 ~]# rados -p kube rm rbd_id.kubernetes-dynamic-pvc-28e6ad1e-4675-11e9-8901-6602f4085af9 rbd_directory rbd_info rbd_header.149046b8b4567

故障解决

Kubenertes使用ceph集群存储

https://akomljen.com/using-existing-ceph-cluster-for-kubernetes-persistent-storage/

创建ceph kube 存储池kube 账户的权限

  1. ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'
  2. [client.kube]
  3. key = AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==

在kube-system namespace中为rbd-provisioner RBAC授权并创建pod

  1. vim rbd-provisioner.yaml
  2. kind: ClusterRole
  3. apiVersion: rbac.authorization.k8s.io/v1
  4. metadata:
  5. name: rbd-provisioner
  6. rules:
  7. - apiGroups: [""]
  8. resources: ["persistentvolumes"]
  9. verbs: ["get", "list", "watch", "create", "delete"]
  10. - apiGroups: [""]
  11. resources: ["persistentvolumeclaims"]
  12. verbs: ["get", "list", "watch", "update"]
  13. - apiGroups: ["storage.k8s.io"]
  14. resources: ["storageclasses"]
  15. verbs: ["get", "list", "watch"]
  16. - apiGroups: [""]
  17. resources: ["events"]
  18. verbs: ["create", "update", "patch"]
  19. - apiGroups: [""]
  20. resources: ["services"]
  21. resourceNames: ["kube-dns","coredns"]
  22. verbs: ["list", "get"]
  23. - apiGroups: [""]
  24. resources: ["endpoints"]
  25. verbs: ["get", "list", "watch", "create", "update", "patch"]
  26. ---
  27. kind: ClusterRoleBinding
  28. apiVersion: rbac.authorization.k8s.io/v1
  29. metadata:
  30. name: rbd-provisioner
  31. subjects:
  32. - kind: ServiceAccount
  33. name: rbd-provisioner
  34. namespace: kube-system
  35. roleRef:
  36. kind: ClusterRole
  37. name: rbd-provisioner
  38. apiGroup: rbac.authorization.k8s.io
  39. ---
  40. apiVersion: rbac.authorization.k8s.io/v1beta1
  41. kind: Role
  42. metadata:
  43. name: rbd-provisioner
  44. rules:
  45. - apiGroups: [""]
  46. resources: ["secrets"]
  47. verbs: ["get"]
  48. ---
  49. apiVersion: rbac.authorization.k8s.io/v1
  50. kind: RoleBinding
  51. metadata:
  52. name: rbd-provisioner
  53. roleRef:
  54. apiGroup: rbac.authorization.k8s.io
  55. kind: Role
  56. name: rbd-provisioner
  57. subjects:
  58. - kind: ServiceAccount
  59. name: rbd-provisioner
  60. namespace: kube-system
  61. ---
  62. apiVersion: v1
  63. kind: ServiceAccount
  64. metadata:
  65. name: rbd-provisioner
  66. ---
  67. apiVersion: extensions/v1beta1
  68. kind: Deployment
  69. metadata:
  70. name: rbd-provisioner
  71. spec:
  72. replicas: 1
  73. strategy:
  74. type: Recreate
  75. template:
  76. metadata:
  77. labels:
  78. app: rbd-provisioner
  79. spec:
  80. containers:
  81. - name: rbd-provisioner
  82. image: ivano/rbd-provisioner
  83. env:
  84. - name: PROVISIONER_NAME
  85. value: ceph.com/rbd
  86. serviceAccount: rbd-provisioner
  87. kubectl -n kube-system apply -f rbd-provisioner.yaml
  • 创建 rbd-provisionerpod时要注意使用的容器镜像中的ceph版本 ``` ceph -v ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)

    我这为luminous

docker history ivano/rbd-provisioner:latest|grep CEPH_VERSION

5 months ago /bin/sh -c #(nop) ENV CEPH_VERSION=luminous 0B

  1. <a name="bfa7d08b"></a>
  2. ## `rbd-provisioner` ceph存储集群授权配置
  3. RBD卷配置器需要Ceph的管理密钥来配置存储

ceph —cluster ceph auth get-key client.admin AQBDO4dcUktZLxAAwByPxao2QROhQpoWYAWsGg==

  1. <a name="05bf04db"></a>
  2. ## 添加Ceph集群admin账户权限
  3. 使用上面的Ceph admin账户的密钥创建secret

kubectl create secret generic ceph-secret \ —type=”kubernetes.io/rbd” \ —from-literal=key=’AQBDO4dcUktZLxAAwByPxao2QROhQpoWYAWsGg==’ \ —namespace=kube-system

  1. <a name="66d1a225"></a>
  2. ## 创建ceph 存储池

sudo ceph —cluster ceph osd pool create kube 1024 1024 sudo ceph —cluster ceph auth get-or-create client.kube mon ‘allow r’ osd ‘allow rwx pool=kube’ sudo ceph —cluster ceph auth get-key client.kube

  1. <a name="af0b934a"></a>
  2. ## 添加Ceph集群kube账户权限

ceph —cluster ceph auth get-key client.kube AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==

kubectl create secret generic ceph-secret-kube \ —type=”kubernetes.io/rbd” \ —from-literal=key=’AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==’ \ —namespace=kube-system

  1. <a name="85166765"></a>
  2. ### 查看secret资源

kubectl get secrets -n kube-system |grep ceph ceph-secret kubernetes.io/rbd 1 54m ceph-secret-kube kubernetes.io/rbd 1 51m

  1. <a name="b1f3ed87"></a>
  2. ## 创建`storageClassName` 并绑定ceph集群节点
  3. 后续pod调用直接使用`storageClassName`调用

vim fast-rbd.yaml

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-rbd provisioner: ceph.com/rbd parameters: monitors: 192.168.122.101:6789, 192.168.122.102:6789, 192.168.122.103:6789 adminId: admin adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kube userId: kube userSecretName: ceph-secret-kube userSecretNamespace: kube-system imageFormat: “2” imageFeatures: layering

kubectl create -f fast-rbd.yaml

  1. <a name="1a63ac23"></a>
  2. ## 示例
  3. <a name="e28198f4"></a>
  4. ### 创建pvc请求

cat <<EOF | kubectl create -f - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myclaim spec: accessModes:

  1. - ReadWriteOnce

resources: requests: storage: 8Gi storageClassName: fast-rbd EOF

  1. 查看是否bond

kubectl get pvc myclaim NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE myclaim Bound pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO fast-rbd 52m

  1. <a name="6b232ac3"></a>
  2. ### 创建pod示例

cat test-pod.yaml apiVersion: v1 kind: Pod metadata: name: ceph-pod1 spec: containers:

  • name: ceph-busybox image: busybox command: [“sleep”, “60000”] volumeMounts:
    • name: ceph-vol1 mountPath: /usr/share/busybox readOnly: false volumes:
  • name: ceph-vol1 persistentVolumeClaim: claimName: ceph-claim

kind: PersistentVolumeClaim apiVersion: v1 metadata: name: ceph-claim spec: accessModes:

  1. - ReadWriteOnce

resources: requests: storage: 2Gi storageClassName: fast-rbd

  1. 检查pvpvc的创建状态,是否都已经创建;

kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-278c2462-448d-11e9-b632-525400804e1e 8Gi RWO Delete Terminating jx/myclaim fast-rbd 129m pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO Delete Bound default/myclaim fast-rbd 66m pvc-d3a5095a-4225-11e9-8d3b-525400d7a6ef 8Gi RWO Delete Bound default/jenkins nfs-dynamic-class 3d5h pvc-ed9c1211-44af-11e9-9d6f-525400d7a6ef 2Gi RWO Delete Bound default/ceph-claim fast-rbd 4m59s pvc-f25b4ce2-44a1-11e9-9d6f-525400d7a6ef 2Gi RWO Delete Terminating kube-system/ceph-claim ceph-rbd 96m

kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ceph-claim Bound pvc-ed9c1211-44af-11e9-9d6f-525400d7a6ef 2Gi RWO fast-rbd 5m2s jenkins Bound pvc-d3a5095a-4225-11e9-8d3b-525400d7a6ef 8Gi RWO nfs-dynamic-class 3d5h myclaim Bound pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO fast-rbd 66m

  1. ceph服务器上,检查rbd镜像创建情况和镜像的信息;

rbd ls —pool rbd kubernetes-dynamic-pvc-1e569f60-44a3-11e9-8e60-fa9f2d515699

rbd ls —pool kube kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6 kubernetes-dynamic-pvc-6038cc76-44a7-11e9-a834-029380302ed2 kubernetes-dynamic-pvc-84a5d823-449e-11e9-bd3d-46e50dc4cee6 kubernetes-dynamic-pvc-edb72324-44af-11e9-a834-029380302ed2

rbd info kube/kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6 rbd image ‘kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6’: size 8GiB in 2048 objects order 22 (4MiB objects) block_name_prefix: rbd_data.11136b8b4567 format: 2 features: layering flags: create_timestamp: Tue Mar 12 16:02:30 2019

  1. 检查busybox内的文件系统挂载和使用情况,确认能正常工作;

kubectl exec -it ceph-pod1 mount |grep rbd /dev/rbd0 on /usr/share/busybox type ext4 (rw,relatime,stripe=1024,data=ordered)

kubectl exec -it ceph-pod1 df |grep rbd /dev/rbd0 1998672 6144 1976144 0% /usr/share/busybox ```