ceph集群
删除CephFS
CRUSH map
查看pool
获取pool参数
查看osd分布
删除poll
集群添加ODS
rbd数据查看
- 删除rbd
故障解决
Kubenertes使用ceph集群存储
- 创建ceph kube 存储池kube 账户的权限
- 在kube-system namespace中为rbd-provisioner RBAC授权并创建pod
我这为luminous

ceph集群

ceph集群创建

可使用普通账户创建ceph集群

export username="ceph-admin"
export passwd="ceph-admin"
export node1="node1"
export node2="node2"
export node3="node3"
export node1_ip="192.168.122.101"
export node2_ip="192.168.122.102"
export node3_ip="192.168.122.103"

创建部署用户和ssh免密码登录

useradd ${username}
echo "${passwd}" | passwd --stdin ${username}
echo "${username} ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/${username}
chmod 0440 /etc/sudoers.d/${username}
sudo mkdir /etc/ceph
sudo chown -R ceph-admin.ceph-admin /etc/ceph

安装 ceph-deploy升级pip

sudo yum install -y  python-pip
pip install --upgrade pip
pip install ceph-deploy

部署节点

创建工作目录，在部署节点时会产生很多信息

mkdir my-cluster
cd my-cluster
ceph-deploy new $node1 $node2 $node3
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
...
[node2][INFO  ] Running command: /usr/sbin/ip addr show
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...

编辑 ceph.conf 配置文件添加`cluster`与`public`网络

# ls
ceph.conf  ceph-deploy-ceph.log  ceph.mon.keyring
vim ceph.conf 
[global]
fsid = 07ef58d8-3457-4cac-aa45-95166c738c16
mon_initial_members = node1, node2, node3
mon_host = 192.168.122.101,192.168.122.102,192.168.122.103
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 192.168.122.0/24
cluster network = 192.168.122.0/24

安装 ceph相关软件

建议使用镜像源
替代 ceph-deploy install node1 node2,不过下面的命令需要在每台node上安装

sudo wget -O /etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo
sudo yum install -y ceph ceph-radosgw

配置初始 monitor(s)、并生成所有密钥

ceph-deploy mon create-initial
ls -l *.keyring
-rw------- 1 root root 71 3月  12 12:53 ceph.bootstrap-mds.keyring
-rw------- 1 root root 71 3月  12 12:53 ceph.bootstrap-mgr.keyring
-rw------- 1 root root 71 3月  12 12:53 ceph.bootstrap-osd.keyring
-rw------- 1 root root 71 3月  12 12:53 ceph.bootstrap-rgw.keyring
-rw------- 1 root root 63 3月  12 12:53 ceph.client.admin.keyring
-rw------- 1 root root 73 3月  12 12:50 ceph.mon.keyring

把配置信息拷贝到各节点

ceph-deploy admin $node1 $node2 $node3

配置 osd

for node in node{1..3};do ceph-deploy disk zap $node /dev/vdc;done
for node in node{1..3};do ceph-deploy osd create $node --data /dev/vdc;done

部署 mgr

ceph-deploy mgr create node{1..3}

开启 dashboard 模块,用于UI查看

sudo ceph mgr module enable dashboard
curl http://localhost:7000

创建 ceph 块客户端用户名和认证密钥

sudo ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd'|tee ./ceph.client.rbd.keyring

把密钥文件拷贝到客户端

for node in node{1..3};do scp ceph.client.rbd.keyring /etc/ceph/ceph.conf $node:/etc/ceph/;done

创建pool

通常在创建pool之前，需要覆盖默认的pg_num，官方推荐：

若少于5个OSD，设置pg_num为128。
5~10个OSD，设置pg_num为512。
10~50个OSD，设置pg_num为4096。
超过50个OSD，可以参考pgcalc计算。

PG和PGP数量一定要根据OSD的数量进行调整，计算公式如下，但是最后算出的结果一定要接近或者等于一个2的指数。

Total PGs = (Total_number_of_OSD * 100) / max_replication_count

修改ceph.conf文件

[ceph-admin@v31 my-cluster]$ cat ceph.conf 
[global]
fsid = 61b3125d-1a74-4901-997e-2cb4625367ab
mon_initial_members = v31, v32, v33
mon_host = 192.168.4.31,192.168.4.32,192.168.4.33
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default pg num = 1024
osd pool default pgp num = 1024
[ceph-admin@v31 my-cluster]$ ceph-deploy --overwrite-conf config push  v31 v32 v33
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf config push v31 v32 v33
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : push
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f89fc4c9128>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  client                        : ['v31', 'v32', 'v33']
[ceph_deploy.cli][INFO  ]  func                          : <function config at 0x7f89fc6f7c08>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.config][DEBUG ] Pushing config to v31
[v31][DEBUG ] connection detected need for sudo
[v31][DEBUG ] connected to host: v31 
[v31][DEBUG ] detect platform information from remote host
[v31][DEBUG ] detect machine type
[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to v32
[v32][DEBUG ] connection detected need for sudo
[v32][DEBUG ] connected to host: v32 
[v32][DEBUG ] detect platform information from remote host
[v32][DEBUG ] detect machine type
[v32][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to v33
[v33][DEBUG ] connection detected need for sudo
[v33][DEBUG ] connected to host: v33 
[v33][DEBUG ] detect platform information from remote host
[v33][DEBUG ] detect machine type
[v33][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

请不要直接修改某个节点的/etc/ceph/ceph.conf文件，而是在部署机下修改ceph.conf，采用推送的方式更加方便安全，修改完成之后，使用下面的命名将conf文件推送到各个节点上：ceph-deploy --overwrite-conf config push v31 v32 v33 此时需要修改各个节点的monitor服务：
systemctl restart ceph-mon@{hostname}.service

例如15个OSD，副本数为3的情况下，根据公式计算的结果应该为500，最接近512，所以需要设定该pool(volumes)的pg_num和pgp_num都为512.

ceph osd pool set volumes pg_num 1024
ceph osd pool set volumes pgp_num 1024

ceph的pool有两种类型，一种是副本池，一种是ec池，创建时也有所区别

创建副本池

ceph osd pool create testpool 128 128 pool 'testpool' created

创建ec池

设置profile

[root@v31 ~]# ceph osd erasure-code-profile set EC-profile k=3 m=1 ruleset-failure-domain=osd 
[root@v31 ~]# ceph osd erasure-code-profile get EC-profile
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=1
plugin=jerasure
technique=reed_sol_van
w=8

创建pool

[root@v31 ~]# ceph osd pool create ecpool 1024 1024 erasure EC-profile
For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
[root@v31 ~]# ceph df
GLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED 
    8.17TiB     8.13TiB      36.3GiB          0.43 
POOLS:
    NAME       ID     USED        %USED     MAX AVAIL     OBJECTS 
    kube       14     1.55GiB      0.06       2.57TiB         612 
    ecpool     20          0B         0       5.79TiB           0

$ sudo ceph osd pool create pool-name pg_num pgp_num erasure

如：

$ ceph osd pool create ecpool 12 12 erasurepool 'ecpool' created

创建mds ceph文件系统

创建 mds 服务

使用 cephFS 集群中必须有 mds 服务

[ceph-admin@v31 my-cluster]$ ceph-deploy mds create v33
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy mds create v33
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fbc3c5e05f0>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  func                          : <function mds at 0x7fbc3c82eed8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  mds                           : [('v33', 'v33')]
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts v33:v33
[v33][DEBUG ] connection detected need for sudo
[v33][DEBUG ] connected to host: v33 
[v33][DEBUG ] detect platform information from remote host
[v33][DEBUG ] detect machine type
[ceph_deploy.mds][INFO  ] Distro info: CentOS Linux 7.6.1810 Core
[ceph_deploy.mds][DEBUG ] remote host will use systemd
[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to v33
[v33][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[v33][WARNIN] mds keyring does not exist yet, creating one
[v33][DEBUG ] create a keyring file
[v33][DEBUG ] create path if it doesn't exist
[v33][INFO  ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.v33 osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-v33/keyring
[v33][INFO  ] Running command: sudo systemctl enable ceph-mds@v33
[v33][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/ceph-mds@v33.service to /usr/lib/systemd/system/ceph-mds@.service.
[v33][INFO  ] Running command: sudo systemctl start ceph-mds@v33
[v33][INFO  ] Running command: sudo systemctl enable ceph.target

创建pool

[ceph-admin@v31 my-cluster]$ ceph osd pool create cluster_data_metadata 1024 1024 
For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
[ceph-admin@v31 my-cluster]$ ceph osd pool create cluster_data 1024 1024 
For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
[ceph-admin@v31 my-cluster]$ ceph fs new cephfs cluster_data_metadata cluster_data
new fs with metadata pool 11 and data pool 12
[ceph-admin@v31 my-cluster]$ ceph df
GLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED 
    8.17TiB     8.14TiB      30.7GiB          0.37 
POOLS:
    NAME                      ID     USED     %USED     MAX AVAIL     OBJECTS 
    cluster_data_metadata     11       0B         0       2.58TiB           0 
    cluster_data              12       0B         0       2.58TiB           0 
[ceph-admin@v31 my-cluster]$ ceph mds stat
cephfs-0/0/1 up 
[ceph-admin@v31 my-cluster]$ ceph osd pool ls
cluster_data_metadata
cluster_data
[ceph-admin@v31 my-cluster]$ ceph fs ls
name: cephfs, metadata pool: cluster_data_metadata, data pools: [cluster_data ]

[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_metadata 1024 1024 replicated_rule 1
pool 'cluster_data_metadata' created
[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 1024 1024 replicated_rule 1
Error ERANGE:  pg_num 1024 size 3 would mean 9216 total pgs, which exceeds max 7500 (mon_max_pg_per_osd 250 * num_in_osds 30)
[ceph-admin@v31 ~]$ ceph df 
GLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED 
    8.17TiB     8.13TiB      36.4GiB          0.43 
POOLS:
    NAME                      ID     USED        %USED     MAX AVAIL     OBJECTS 
    kube                      14     1.57GiB      0.06       2.57TiB         614 
    cluster_data_metadata     21          0B         0       2.57TiB           0 
[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data  1024 replicated_rule 1
Error ERANGE:  pg_num 1024 size 3 would mean 9216 total pgs, which exceeds max 7500 (mon_max_pg_per_osd 250 * num_in_osds 30)
[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 100 replicated_rule 1
pool 'cluster_data_data' created
[ceph-admin@v31 ~]$ ceph df 
GLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED 
    8.17TiB     8.13TiB      36.4GiB          0.43 
POOLS:
    NAME                      ID     USED        %USED     MAX AVAIL     OBJECTS 
    kube                      14     1.57GiB      0.06       2.57TiB         614 
    cluster_data_metadata     21          0B         0       2.57TiB           0 
    cluster_data_data         22          0B         0       2.57TiB           0

创建osd存存储池

ceph osd pool create rbd 50
ceph osd pool create kube 50
# 开启监控
ceph osd pool application enable kube mon
ceph osd pool application enable rbd mon

创建用户(可选)

ceph auth get-or-create client.cephfs mon 'allow r' mds 'allow r, allow rw path=/' osd 'allow rw pool=cephfs_data' -o ceph.client.cephfs.keyring
scp ceph.client.cephfs.keyring <node>:/etc/ceph/

对应ceph服务器上获取client-key

ceph auth get-key client.cephfs

这里可以直接使用admin账户的keyring

cat ceph.client.admin.keyring
[client.admin]
    key = AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==

通过内核驱动挂载 Ceph FS

安装 ceph-fuse

yum install ceph-fuse -y

确认kernel 加载 ceph 模块

lsmod | grep ceph
ceph                  358802  0 
libceph               306625  1 ceph
dns_resolver           13140  2 nfsv4,libceph
libcrc32c              12644  4 ip_vs,libceph,nf_nat,nf_conntrack

创建挂载目录

mkdir -p /data

挂载

[ceph-admin@v31 my-cluster]$ sudo mount -t ceph v31:6789:/ /data -o name=admin,secret=AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==
[ceph-admin@v31 my-cluster]$ df -Th |grep ceph
192.168.4.31:6789:/     ceph      2.6T     0  2.6T   0% /data

写入/etc/fstab

[ceph-admin@v31 my-cluster]$ cd /etc/ceph/
[ceph-admin@v31 ceph]$ cp ceph.client.admin.keyring cephfs.key
[ceph-admin@v31 ceph]$ vim cephfs.key
AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==
echo "v31:6789:/ /data  ceph name=admin,secretfile=/etc/ceph/cephfs.key,noatime,_netdev    0       0 " >>/etc/fstab

CephFS性能测试

fio

随机读测试

[root@v31 ~]# fio -filename=/mnt/data/test1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=10G -numjobs=10 -runtime=1000 -group_reporting -name=mytest 
mytest: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 10 threads
mytest: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [r(1),_(1),r(2),_(1),r(5)][99.8%][r=160MiB/s,w=0KiB/s][r=10.2k,w=0 IOPS][eta 00m:02s]
mytest: (groupid=0, jobs=10): err= 0: pid=3824106: Tue Mar 26 09:13:04 2019
   read: IOPS=7359, BW=115MiB/s (121MB/s)(100GiB/890546msec)
    clat (usec): min=155, max=215229, avg=1355.08, stdev=1870.48
     lat (usec): min=155, max=215229, avg=1355.40, stdev=1870.48
    clat percentiles (usec):
     |  1.00th=[  200],  5.00th=[  217], 10.00th=[  231], 20.00th=[  265],
     | 30.00th=[  486], 40.00th=[  578], 50.00th=[  660], 60.00th=[  799],
     | 70.00th=[ 1037], 80.00th=[ 1893], 90.00th=[ 3982], 95.00th=[ 5080],
     | 99.00th=[ 7701], 99.50th=[ 9110], 99.90th=[15664], 99.95th=[19530],
     | 99.99th=[28705]
   bw (  KiB/s): min= 3040, max=28610, per=10.01%, avg=11782.72, stdev=3610.33, samples=17792
   iops        : min=  190, max= 1788, avg=736.38, stdev=225.63, samples=17792
  lat (usec)   : 250=16.50%, 500=14.76%, 750=25.96%, 1000=11.70%
  lat (msec)   : 2=11.60%, 4=9.52%, 10=9.62%, 20=0.30%, 50=0.04%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.39%, sys=1.82%, ctx=6694389, majf=0, minf=5367
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=6553600,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
   READ: bw=115MiB/s (121MB/s), 115MiB/s-115MiB/s (121MB/s-121MB/s), io=100GiB (107GB), run=890546-890546msec

顺序读测试

[root@v33 ~]# fio -filename=/mnt/data/test2 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest 
mytest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 30 threads
mytest: Laying out IO file (1 file / 10240MiB)
Jobs: 30 (f=30): [R(30)][100.0%][r=138MiB/s,w=0KiB/s][r=8812,w=0 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=30): err= 0: pid=411789: Tue Mar 26 09:33:03 2019
   read: IOPS=10.0k, BW=156MiB/s (164MB/s)(153GiB/1000005msec)
    clat (usec): min=141, max=38416, avg=2992.85, stdev=2478.50
     lat (usec): min=141, max=38416, avg=2993.14, stdev=2478.52
    clat percentiles (usec):
     |  1.00th=[  161],  5.00th=[  174], 10.00th=[  188], 20.00th=[  260],
     | 30.00th=[  652], 40.00th=[ 1467], 50.00th=[ 2999], 60.00th=[ 3949],
     | 70.00th=[ 4490], 80.00th=[ 5342], 90.00th=[ 6325], 95.00th=[ 7111],
     | 99.00th=[ 8848], 99.50th=[ 9503], 99.90th=[10814], 99.95th=[11731],
     | 99.99th=[18482]
   bw (  KiB/s): min= 1472, max=47743, per=3.34%, avg=5349.53, stdev=4848.75, samples=60000
   iops        : min=   92, max= 2983, avg=334.11, stdev=303.03, samples=60000
  lat (usec)   : 250=19.25%, 500=7.26%, 750=5.25%, 1000=3.04%
  lat (msec)   : 2=9.00%, 4=17.07%, 10=38.87%, 20=0.26%, 50=0.01%
  cpu          : usr=0.17%, sys=1.04%, ctx=14529895, majf=0, minf=3600
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=10015571,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
   READ: bw=156MiB/s (164MB/s), 156MiB/s-156MiB/s (164MB/s-164MB/s), io=153GiB (164GB), run=1000005-1000005msec

随机写测试

[root@v31 ~]# fio -filename=/mnt/data/test3 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest_4k_10G_randwrite 
mytest_4k_10G_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.1
Starting 30 threads
mytest_4k_10G_randwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 30 (f=30): [w(30)][100.0%][r=0KiB/s,w=11.8MiB/s][r=0,w=3009 IOPS][eta 00m:00s]
mytest_4k_10G_randwrite: (groupid=0, jobs=30): err= 0: pid=3852817: Tue Mar 26 09:59:25 2019
  write: IOPS=3107, BW=12.1MiB/s (12.7MB/s)(11.9GiB/1000067msec)
    clat (usec): min=922, max=230751, avg=9651.32, stdev=16589.93
     lat (usec): min=923, max=230751, avg=9651.74, stdev=16589.93
    clat percentiles (usec):
     |  1.00th=[  1188],  5.00th=[  1319], 10.00th=[  1418], 20.00th=[  1565],
     | 30.00th=[  1745], 40.00th=[  1991], 50.00th=[  2343], 60.00th=[  3097],
     | 70.00th=[  6325], 80.00th=[ 11994], 90.00th=[ 30278], 95.00th=[ 46924],
     | 99.00th=[ 79168], 99.50th=[ 91751], 99.90th=[121111], 99.95th=[130548],
     | 99.99th=[158335]
   bw (  KiB/s): min=  112, max= 1162, per=3.34%, avg=414.50, stdev=92.93, samples=60000
   iops        : min=   28, max=  290, avg=103.60, stdev=23.22, samples=60000
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=40.50%, 4=23.80%, 10=13.13%, 20=7.97%, 50=10.27%
  lat (msec)   : 100=4.00%, 250=0.32%
  cpu          : usr=0.06%, sys=0.30%, ctx=3110768, majf=0, minf=141484
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,3107281,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
  WRITE: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s), io=11.9GiB (12.7GB), run=1000067-1000067msec

顺序写测试

[root@v33 ~]# fio -filename=/mnt/data/test4 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest 
mytest: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 30 threads
mytest: Laying out IO file (1 file / 10240MiB)
Jobs: 30 (f=30): [W(30)][100.0%][r=0KiB/s,w=50.3MiB/s][r=0,w=3219 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=30): err= 0: pid=454215: Tue Mar 26 10:19:27 2019
  write: IOPS=3322, BW=51.9MiB/s (54.4MB/s)(50.7GiB/1000007msec)
    clat (usec): min=1130, max=121544, avg=9026.88, stdev=2132.29
     lat (usec): min=1131, max=121545, avg=9027.49, stdev=2132.30
    clat percentiles (usec):
     |  1.00th=[ 4047],  5.00th=[ 6325], 10.00th=[ 7308], 20.00th=[ 7963],
     | 30.00th=[ 8291], 40.00th=[ 8586], 50.00th=[ 8848], 60.00th=[ 9110],
     | 70.00th=[ 9503], 80.00th=[10028], 90.00th=[10814], 95.00th=[11600],
     | 99.00th=[17171], 99.50th=[20317], 99.90th=[25035], 99.95th=[26608],
     | 99.99th=[44303]
   bw (  KiB/s): min=  896, max= 3712, per=3.34%, avg=1772.81, stdev=213.20, samples=60000
   iops        : min=   56, max=  232, avg=110.76, stdev=13.32, samples=60000
  lat (msec)   : 2=0.08%, 4=0.88%, 10=79.28%, 20=19.23%, 50=0.53%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.06%, sys=0.55%, ctx=3581559, majf=0, minf=4243
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,3322270,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
  WRITE: bw=51.9MiB/s (54.4MB/s), 51.9MiB/s-51.9MiB/s (54.4MB/s-54.4MB/s), io=50.7GiB (54.4GB), run=1000007-1000007msec

混合随机读写

fio -filename=/data/test5 -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=100 -group_reporting -name=mytest -ioscheduler=noop

同步i/o(顺序写)测试

[root@v31 data]# fio -filename=/mnt/data/test6 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=4k -size=50G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
mytest: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.1
Starting 10 threads
mytest: Laying out IO file (1 file / 51200MiB)
Jobs: 10 (f=10): [W(10)][100.0%][r=0KiB/s,w=25.6MiB/s][r=0,w=6549 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=3883680: Tue Mar 26 10:48:08 2019
  write: IOPS=6180, BW=24.1MiB/s (25.3MB/s)(23.6GiB/1000001msec)
    clat (usec): min=825, max=176948, avg=1615.44, stdev=989.83
     lat (usec): min=826, max=176949, avg=1615.81, stdev=989.83
    clat percentiles (usec):
     |  1.00th=[ 1020],  5.00th=[ 1106], 10.00th=[ 1188], 20.00th=[ 1303],
     | 30.00th=[ 1369], 40.00th=[ 1434], 50.00th=[ 1500], 60.00th=[ 1565],
     | 70.00th=[ 1647], 80.00th=[ 1778], 90.00th=[ 2024], 95.00th=[ 2245],
     | 99.00th=[ 2933], 99.50th=[ 4817], 99.90th=[18744], 99.95th=[19268],
     | 99.99th=[21890]
   bw (  KiB/s): min= 1280, max= 3920, per=10.00%, avg=2473.24, stdev=365.21, samples=19998
   iops        : min=  320, max=  980, avg=618.27, stdev=91.30, samples=19998
  lat (usec)   : 1000=0.63%
  lat (msec)   : 2=88.90%, 4=9.90%, 10=0.26%, 20=0.30%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.27%, sys=1.59%, ctx=6286666, majf=0, minf=1148
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,6180315,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
  WRITE: bw=24.1MiB/s (25.3MB/s), 24.1MiB/s-24.1MiB/s (25.3MB/s-25.3MB/s), io=23.6GiB (25.3GB), run=1000001-1000001msec

异步i/o(顺序写)测试

fio -filename=/data/test7 -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=4k -size=50G -numjobs=10 -runtime=1000 -group_reporting -name=mytest

磁盘性能测试

为了对比Ceph文件的性能，此处做了一个单块磁盘的性能测试，为了确保测试的真实性，单块磁盘就选择为一个OSD对应的磁盘。

随机读测试-单块硬盘

fio -filename=/var/lib/ceph/osd/ceph-4/disktest/dlw1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=10G -numjobs=10 -runtime=1000 -group_reporting -name=mytest

rados性能测试

4M写入测试

rados bench -p cluster_data_data 60 write  -t 32 --no-cleanup 
Total time run:         60.717291
Total writes made:      2238
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     147.437
Stddev Bandwidth:       20.1603
Max bandwidth (MB/sec): 168
Min bandwidth (MB/sec): 48
Average IOPS:           36
Stddev IOPS:            5
Max IOPS:               42
Min IOPS:               12
Average Latency(s):     0.865663
Stddev Latency(s):      0.40126
Max latency(s):         3.58639
Min latency(s):         0.185036

4k写入测试

rados bench -p cluster_data_data 60 write  -t 32 -b 4096  --no-cleanup
Total time run:         60.035923
Total writes made:      201042
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     13.0808
Stddev Bandwidth:       1.10742
Max bandwidth (MB/sec): 17.1133
Min bandwidth (MB/sec): 9.71875
Average IOPS:           3348
Stddev IOPS:            283
Max IOPS:               4381
Min IOPS:               2488
Average Latency(s):     0.00955468
Stddev Latency(s):      0.0164307
Max latency(s):         0.335681
Min latency(s):         0.00105769

4M顺序读

rados bench -p cluster_data_data 60 seq  -t 32  --no-cleanup 
Total time run:       22.129977
Total reads made:     201042
Read size:            4096
Object size:          4096
Bandwidth (MB/sec):   35.4867
Average IOPS:         9084
Stddev IOPS:          1278
Max IOPS:             14011
Min IOPS:             7578
Average Latency(s):   0.0035112
Max latency(s):       0.181241
Min latency(s):       0.000287577

删除CephFS

[root@v32 ~]#  ceph df 
GLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED 
    8.17TiB     7.91TiB       262GiB          3.13 
POOLS:
    NAME                      ID     USED        %USED     MAX AVAIL     OBJECTS 
    cluster_data_metadata     11      231MiB         0       2.49TiB       51638 
    cluster_data              12     65.2GiB      2.49       2.49TiB      317473 
[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
Error EBUSY: pool 'cluster_data_metadata' is in use by CephFS
[root@v32 ~]# ceph fs ls 
name: cephfs, metadata pool: cluster_data_metadata, data pools: [cluster_data ]
[root@v32 ~]# ceph fs rm  cephfs --yes-i-really-mean-it 
Error EINVAL: all MDS daemons must be inactive before removing filesystem
[root@v33 ~]# systemctl stop ceph-mds@v33.service 
[root@v33 ~]# systemctl desable  ceph-mds@v33.service 
Unknown operation 'desable'.
[root@v33 ~]# systemctl disable  ceph-mds@v33.service 
Removed symlink /etc/systemd/system/ceph-mds.target.wants/ceph-mds@v33.service.
[root@v32 ~]# ceph fs rm  cephfs --yes-i-really-mean-it 
[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@v32 ~]# cat /etc/ceph/ceph.conf 
[global]
...
[mon]
mon allow pool delete = true
[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
pool 'cluster_data_metadata' removed
[root@v32 ~]# ceph osd pool delete cluster_data cluster_data --yes-i-really-really-mean-it
pool 'cluster_data' removed

CRUSH map

1、提取已有的CRUSH map ，使用-o参数，ceph将输出一个经过编译的CRUSH map 到您指定的文件
    ` ceph osd getcrushmap -o crushmap.txt`
2、反编译你的CRUSH map ，使用-d参数将反编译CRUSH map 到通过-o 指定的文件中
    `crushtool -d crushmap.txt -o crushmap-decompile`
3、使用编辑器编辑CRUSH map
    `vi crushmap-decompile`
4、重新编译这个新的CRUSH map
    `crushtool -c crushmap-decompile -o crushmap-compiled`
5、将新的CRUSH map 应用到ceph 集群中
    `ceph osd setcrushmap -i crushmap-compiled`

参考https://blog.csdn.net/heivy/article/details/50592244

查看pool

列出所有的poll

[ceph-admin@v31 my-cluster]$ ceph df
GLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED 
    8.17TiB     8.14TiB      30.5GiB          0.37 
POOLS:
    NAME                      ID     USED     %USED     MAX AVAIL     OBJECTS 
    cluster_data_metadata     2        0B         0       2.58TiB           0 
[ceph-admin@v31 my-cluster]$  rados lspools
cluster_data_metadata

删除cluster_data_metadata pool

查看pool的详细配置信息

[ceph-admin@v31 my-cluster]$ ceph osd pool ls detail
pool 2 'cluster_data_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 146 flags hashpspool stripe_width 0

[ceph-admin@v31 my-cluster]$ ceph osd dump|grep pool
pool 2 'cluster_data_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 146 flags hashpspool stripe_width 0

查看每个pool的空间使用及IO情况

[root@v32 ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD      WR_OPS WR      
kube       36B       4      0     12                  0       0        0   5538 34.1MiB 142769 10.4GiB 
total_objects    4
total_used       31.8GiB
total_avail      8.14TiB
total_space      8.17TiB

获取pool参数

查看osd分布

 ceph osd tree
ID CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF 
-1       8.16879 root default                           
-3       2.72293     host v31                           
 0   hdd 0.27229         osd.0      up  1.00000 1.00000 
 1   hdd 0.27229         osd.1      up  1.00000 1.00000 
 2   hdd 0.27229         osd.2      up  1.00000 1.00000 
 3   hdd 0.27229         osd.3      up  1.00000 1.00000 
 4   hdd 0.27229         osd.4      up  1.00000 1.00000 
 5   hdd 0.27229         osd.5      up  1.00000 1.00000 
 6   hdd 0.27229         osd.6      up  1.00000 1.00000 
 7   hdd 0.27229         osd.7      up  1.00000 1.00000 
24   hdd 0.27229         osd.24     up  1.00000 1.00000 
25   hdd 0.27229         osd.25     up  1.00000 1.00000 
-5       2.72293     host v32                           
 8   hdd 0.27229         osd.8      up  1.00000 1.00000 
 9   hdd 0.27229         osd.9      up  1.00000 1.00000 
10   hdd 0.27229         osd.10     up  1.00000 1.00000 
11   hdd 0.27229         osd.11     up  1.00000 1.00000 
12   hdd 0.27229         osd.12     up  1.00000 1.00000 
13   hdd 0.27229         osd.13     up  1.00000 1.00000 
14   hdd 0.27229         osd.14     up  1.00000 1.00000 
15   hdd 0.27229         osd.15     up  1.00000 1.00000 
27   hdd 0.27229         osd.27     up  1.00000 1.00000 
29   hdd 0.27229         osd.29     up  1.00000 1.00000 
-7       2.72293     host v33                           
16   hdd 0.27229         osd.16     up  1.00000 1.00000 
17   hdd 0.27229         osd.17     up  1.00000 1.00000 
18   hdd 0.27229         osd.18     up  1.00000 1.00000 
19   hdd 0.27229         osd.19     up  1.00000 1.00000 
20   hdd 0.27229         osd.20     up  1.00000 1.00000 
21   hdd 0.27229         osd.21     up  1.00000 1.00000 
22   hdd 0.27229         osd.22     up  1.00000 1.00000 
23   hdd 0.27229         osd.23     up  1.00000 1.00000 
26   hdd 0.27229         osd.26     up  1.00000 1.00000 
28   hdd 0.27229         osd.28     up  1.00000 1.00000

删除poll

sudo ceph osd pool delete {pool-name} {pool-name} --yes-i-really-really-mean-it

sudo ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it

如果删除pool时提示error请参考: 删除pool error的解决方法

集群添加ODS

[ceph-admin@v31 my-cluster]$ ceph -s
  cluster:
    id:     ffdda80f-a48a-431a-a71b-525e5f1965d9
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum v31,v32,v33
    mgr: v31(active), standbys: v32, v33
    osd: 24 osds: 24 up, 24 in
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0B
    usage:   24.3GiB used, 6.51TiB / 6.54TiB avail
    pgs:

补充知识：osd状态

up：守护进程运行中，能够提供IO服务；
down：守护进程不在运行，无法提供IO服务；
in：包含数据；
out：不包含数据

列出所有磁盘

[root@v33 ~]#  sudo ceph-disk list 
/dev/dm-0 other, ext4, mounted on /
/dev/dm-1 other, swap
/dev/dm-2 other, unknown
/dev/dm-3 other, unknown
/dev/dm-4 other, unknown
/dev/dm-5 other, unknown
/dev/dm-6 other, unknown
/dev/dm-7 other, unknown
/dev/dm-8 other, unknown
/dev/dm-9 other, unknown
/dev/sda :
 /dev/sda1 other, vfat, mounted on /boot/efi
 /dev/sda2 other, xfs, mounted on /boot
 /dev/sda3 other, LVM2_member
/dev/sdb other, unknown
/dev/sdc other, unknown
/dev/sdd other, LVM2_member
/dev/sde other, LVM2_member
/dev/sdf other, LVM2_member
/dev/sdg other, LVM2_member
/dev/sdh other, LVM2_member
/dev/sdi other, LVM2_member
/dev/sdj other, LVM2_member
/dev/sdk other, LVM2_member

添加时报错


[ceph-admin@v31 my-cluster]$  ceph-deploy osd create v31 --data /dev/sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy osd create v31 --data /dev/sdb
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ]  verbose                       : False
[ceph_deploy.cli][INFO ]  bluestore                     : None
[ceph_deploy.cli][INFO ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fe69d002830>
[ceph_deploy.cli][INFO ]  cluster                       : ceph
[ceph_deploy.cli][INFO ]  fs_type                       : xfs
[ceph_deploy.cli][INFO ]  block_wal                     : None
[ceph_deploy.cli][INFO ]  default_release               : False
[ceph_deploy.cli][INFO ]  username                      : None
[ceph_deploy.cli][INFO ]  journal                       : None
[ceph_deploy.cli][INFO ]  subcommand                    : create
[ceph_deploy.cli][INFO ]  host                          : v31
[ceph_deploy.cli][INFO ]  filestore                     : None
[ceph_deploy.cli][INFO ]  func                          : <function osd at 0x7fe69d2478c0>
[ceph_deploy.cli][INFO ]  ceph_conf                     : None
[ceph_deploy.cli][INFO ]  zap_disk                      : False
[ceph_deploy.cli][INFO ]  data                          : /dev/sdb
[ceph_deploy.cli][INFO ]  block_db                      : None
[ceph_deploy.cli][INFO ]  dmcrypt                       : False
[ceph_deploy.cli][INFO ]  overwrite_conf                : False
[ceph_deploy.cli][INFO ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ]  quiet                         : False
[ceph_deploy.cli][INFO ]  debug                         : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
[v31][DEBUG ] connection detected need for sudo
[v31][DEBUG ] connected to host: v31 
[v31][DEBUG ] detect platform information from remote host
[v31][DEBUG ] detect machine type
[v31][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to v31
[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
[ceph-admin@v31 my-cluster]$  ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  bluestore                     : None
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7ff6abd72830>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  fs_type                       : xfs
[ceph_deploy.cli][INFO  ]  block_wal                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  journal                       : None
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  host                          : v31
[ceph_deploy.cli][INFO  ]  filestore                     : None
[ceph_deploy.cli][INFO  ]  func                          : <function osd at 0x7ff6abfb78c0>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  zap_disk                      : False
[ceph_deploy.cli][INFO  ]  data                          : /dev/sdb
[ceph_deploy.cli][INFO  ]  block_db                      : None
[ceph_deploy.cli][INFO  ]  dmcrypt                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  debug                         : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
[v31][DEBUG ] connection detected need for sudo
[v31][DEBUG ] connected to host: v31 
[v31][DEBUG ] detect platform information from remote host
[v31][DEBUG ] detect machine type
[v31][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.6.1810 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to v31
[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[v31][DEBUG ] find the location of an executable
[v31][INFO  ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
[v31][WARNIN] usage: ceph-volume lvm create [-h] --data DATA [--filestore]
[v31][WARNIN]                               [--journal JOURNAL] [--bluestore]
[v31][WARNIN]                               [--block.db BLOCK_DB] [--block.wal BLOCK_WAL]
[v31][WARNIN]                               [--osd-id OSD_ID] [--osd-fsid OSD_FSID]
[v31][WARNIN]                               [--cluster-fsid CLUSTER_FSID]
[v31][WARNIN]                               [--crush-device-class CRUSH_DEVICE_CLASS]
[v31][WARNIN]                               [--dmcrypt] [--no-systemd]
[v31][WARNIN] ceph-volume lvm create: error: GPT headers found, they must be removed on: /dev/sdb
[v31][ERROR ] RuntimeError: command returned non-zero exit status: 2
[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
[ceph-admin@v31 my-cluster]$ /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
--> Falling back to /tmp/ for logging. Can't use /var/log/ceph/ceph-volume.log
--> [Errno 13] Permission denied: '/var/log/ceph/ceph-volume.log'
 stderr: error: /dev/sdb: Permission denied
-->  SuperUserError: This command needs to be executed with sudo or as root
[ceph-admin@v31 my-cluster]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
usage: ceph-volume lvm create [-h] --data DATA [--filestore]
                              [--journal JOURNAL] [--bluestore]
                              [--block.db BLOCK_DB] [--block.wal BLOCK_WAL]
                              [--osd-id OSD_ID] [--osd-fsid OSD_FSID]
                              [--cluster-fsid CLUSTER_FSID]
                              [--crush-device-class CRUSH_DEVICE_CLASS]
                              [--dmcrypt] [--no-systemd]
ceph-volume lvm create: error: GPT headers found, they must be removed on: /dev/sdb

转换为mbr

[ceph-admin@v31 my-cluster]$ sudo parted -s /dev/sdb mklabel msdos

再次格式化

[root@v32 ~]# /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdc
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new d47d7861-9a83-4879-847d-693e3aa794b6
Running command: vgcreate --force --yes ceph-fad7bf25-dd60-4eff-a932-970c376af00b /dev/sdc
 stdout: Wiping dos signature on /dev/sdc.
 stdout: Physical volume "/dev/sdc" successfully created.
 stdout: Volume group "ceph-fad7bf25-dd60-4eff-a932-970c376af00b" successfully created
Running command: lvcreate --yes -l 100%FREE -n osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 ceph-fad7bf25-dd60-4eff-a932-970c376af00b
 stdout: Logical volume "osd-block-d47d7861-9a83-4879-847d-693e3aa794b6" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-27
Running command: restorecon /var/lib/ceph/osd/ceph-27
Running command: chown -h ceph:ceph /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6
Running command: chown -R ceph:ceph /dev/dm-10
Running command: ln -s /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 /var/lib/ceph/osd/ceph-27/block
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-27/activate.monmap
 stderr: got monmap epoch 2
Running command: ceph-authtool /var/lib/ceph/osd/ceph-27/keyring --create-keyring --name osd.27 --add-key AQB4yohcMW8eLhAAkdQmhIavIcF+FcPjkKooSQ==
 stdout: creating /var/lib/ceph/osd/ceph-27/keyring
 stdout: added entity osd.27 auth auth(auid = 18446744073709551615 key=AQB4yohcMW8eLhAAkdQmhIavIcF+FcPjkKooSQ== with 0 caps)
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27/keyring
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 27 --monmap /var/lib/ceph/osd/ceph-27/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-27/ --osd-uuid d47d7861-9a83-4879-847d-693e3aa794b6 --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: /dev/sdc
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 --path /var/lib/ceph/osd/ceph-27
Running command: ln -snf /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 /var/lib/ceph/osd/ceph-27/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-27/block
Running command: chown -R ceph:ceph /dev/dm-10
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27
Running command: systemctl enable ceph-volume@lvm-27-d47d7861-9a83-4879-847d-693e3aa794b6
 stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-27-d47d7861-9a83-4879-847d-693e3aa794b6.service to /usr/lib/systemd/system/ceph-volume@.service.
Running command: systemctl enable --runtime ceph-osd@27
 stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@27.service to /usr/lib/systemd/system/ceph-osd@.service.
Running command: systemctl start ceph-osd@27
--> ceph-volume lvm activate successful for osd ID: 27
--> ceph-volume lvm create successful for: /dev/sdc
[root@v32 ~]# /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdd
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new fade79d7-8bee-49c6-85f8-d6c141e6bd4e
Running command: vgcreate --force --yes ceph-fc851010-f2c6-43f7-9c12-843d3a023a65 /dev/sdd
 stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-00af8489-3599-427d-bae6-de1e61c4c38a'
  Unable to add physical volume '/dev/sdd' to volume group 'ceph-00af8489-3599-427d-bae6-de1e61c4c38a'
  /dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.29 --yes-i-really-mean-it
 stderr: purged osd.29
-->  RuntimeError: command returned non-zero exit status: 5
ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb

rbd数据查看

[root@v32 ~]# rados ls  -p kube
rbd_id.kubernetes-dynamic-pvc-28e6ad1e-4675-11e9-8901-6602f4085af9
rbd_directory
rbd_info
rbd_header.149046b8b4567

删除rbd

[root@v32 ~]# rados -p kube rm rbd_id.kubernetes-dynamic-pvc-28e6ad1e-4675-11e9-8901-6602f4085af9 rbd_directory rbd_info rbd_header.149046b8b4567

故障解决

Kubenertes使用ceph集群存储

https://akomljen.com/using-existing-ceph-cluster-for-kubernetes-persistent-storage/

创建ceph kube 存储池`kube` 账户的权限

ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'
[client.kube]
    key = AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==

在kube-system namespace中为rbd-provisioner RBAC授权并创建pod

vim rbd-provisioner.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rbd-provisioner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
  - apiGroups: [""]
    resources: ["services"]
    resourceNames: ["kube-dns","coredns"]
    verbs: ["list", "get"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rbd-provisioner
subjects:
  - kind: ServiceAccount
    name: rbd-provisioner
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: rbd-provisioner
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: rbd-provisioner
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: rbd-provisioner
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: rbd-provisioner
subjects:
- kind: ServiceAccount
  name: rbd-provisioner
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rbd-provisioner
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: rbd-provisioner
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: rbd-provisioner
    spec:
      containers:
      - name: rbd-provisioner
        image: ivano/rbd-provisioner
        env:
        - name: PROVISIONER_NAME
          value: ceph.com/rbd
      serviceAccount: rbd-provisioner
kubectl -n kube-system apply -f rbd-provisioner.yaml

创建 rbd-provisionerpod时要注意使用的容器镜像中的ceph版本 ``` ceph -v ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
我这为luminous

docker history ivano/rbd-provisioner:latest|grep CEPH_VERSION

5 months ago /bin/sh -c #(nop) ENV CEPH_VERSION=luminous 0B


<a name="bfa7d08b"></a>
## `rbd-provisioner` ceph存储集群授权配置
RBD卷配置器需要Ceph的管理密钥来配置存储

ceph —cluster ceph auth get-key client.admin AQBDO4dcUktZLxAAwByPxao2QROhQpoWYAWsGg==


<a name="05bf04db"></a>
## 添加Ceph集群admin账户权限
使用上面的Ceph admin账户的密钥创建secret

kubectl create secret generic ceph-secret \ —type=”kubernetes.io/rbd” \ —from-literal=key=’AQBDO4dcUktZLxAAwByPxao2QROhQpoWYAWsGg==’ \ —namespace=kube-system


<a name="66d1a225"></a>
## 创建ceph 存储池

sudo ceph —cluster ceph osd pool create kube 1024 1024 sudo ceph —cluster ceph auth get-or-create client.kube mon ‘allow r’ osd ‘allow rwx pool=kube’ sudo ceph —cluster ceph auth get-key client.kube


<a name="af0b934a"></a>
## 添加Ceph集群kube账户权限

ceph —cluster ceph auth get-key client.kube AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==

kubectl create secret generic ceph-secret-kube \ —type=”kubernetes.io/rbd” \ —from-literal=key=’AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==’ \ —namespace=kube-system


<a name="85166765"></a>
### 查看secret资源

kubectl get secrets -n kube-system |grep ceph ceph-secret kubernetes.io/rbd 1 54m ceph-secret-kube kubernetes.io/rbd 1 51m


<a name="b1f3ed87"></a>
## 创建`storageClassName` 并绑定ceph集群节点
后续pod调用直接使用`storageClassName`调用

vim fast-rbd.yaml

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-rbd provisioner: ceph.com/rbd parameters: monitors: 192.168.122.101:6789, 192.168.122.102:6789, 192.168.122.103:6789 adminId: admin adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kube userId: kube userSecretName: ceph-secret-kube userSecretNamespace: kube-system imageFormat: “2” imageFeatures: layering

kubectl create -f fast-rbd.yaml


<a name="1a63ac23"></a>
## 示例
<a name="e28198f4"></a>
### 创建pvc请求

cat <<EOF | kubectl create -f - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myclaim spec: accessModes:

- ReadWriteOnce

resources: requests: storage: 8Gi storageClassName: fast-rbd EOF


查看是否bond

kubectl get pvc myclaim NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE myclaim Bound pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO fast-rbd 52m


<a name="6b232ac3"></a>
### 创建pod示例

cat test-pod.yaml apiVersion: v1 kind: Pod metadata: name: ceph-pod1 spec: containers:

name: ceph-busybox image: busybox command: [“sleep”, “60000”] volumeMounts:
- name: ceph-vol1 mountPath: /usr/share/busybox readOnly: false volumes:
name: ceph-vol1 persistentVolumeClaim: claimName: ceph-claim

kind: PersistentVolumeClaim apiVersion: v1 metadata: name: ceph-claim spec: accessModes:

- ReadWriteOnce

resources: requests: storage: 2Gi storageClassName: fast-rbd


检查pv和pvc的创建状态，是否都已经创建；

kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-278c2462-448d-11e9-b632-525400804e1e 8Gi RWO Delete Terminating jx/myclaim fast-rbd 129m pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO Delete Bound default/myclaim fast-rbd 66m pvc-d3a5095a-4225-11e9-8d3b-525400d7a6ef 8Gi RWO Delete Bound default/jenkins nfs-dynamic-class 3d5h pvc-ed9c1211-44af-11e9-9d6f-525400d7a6ef 2Gi RWO Delete Bound default/ceph-claim fast-rbd 4m59s pvc-f25b4ce2-44a1-11e9-9d6f-525400d7a6ef 2Gi RWO Delete Terminating kube-system/ceph-claim ceph-rbd 96m

kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ceph-claim Bound pvc-ed9c1211-44af-11e9-9d6f-525400d7a6ef 2Gi RWO fast-rbd 5m2s jenkins Bound pvc-d3a5095a-4225-11e9-8d3b-525400d7a6ef 8Gi RWO nfs-dynamic-class 3d5h myclaim Bound pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO fast-rbd 66m


在ceph服务器上，检查rbd镜像创建情况和镜像的信息；

rbd ls —pool rbd kubernetes-dynamic-pvc-1e569f60-44a3-11e9-8e60-fa9f2d515699

rbd ls —pool kube kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6 kubernetes-dynamic-pvc-6038cc76-44a7-11e9-a834-029380302ed2 kubernetes-dynamic-pvc-84a5d823-449e-11e9-bd3d-46e50dc4cee6 kubernetes-dynamic-pvc-edb72324-44af-11e9-a834-029380302ed2

rbd info kube/kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6 rbd image ‘kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6’: size 8GiB in 2048 objects order 22 (4MiB objects) block_name_prefix: rbd_data.11136b8b4567 format: 2 features: layering flags: create_timestamp: Tue Mar 12 16:02:30 2019


检查busybox内的文件系统挂载和使用情况，确认能正常工作；

kubectl exec -it ceph-pod1 mount |grep rbd /dev/rbd0 on /usr/share/busybox type ext4 (rw,relatime,stripe=1024,data=ordered)

kubectl exec -it ceph-pod1 df |grep rbd /dev/rbd0 1998672 6144 1976144 0% /usr/share/busybox ```

4.4.3、ceph集群安装

ceph集群

ceph集群创建

可使用普通账户创建ceph集群

创建部署用户和ssh免密码登录

安装 ceph-deploy升级pip

部署节点

编辑 ceph.conf 配置文件添加cluster与public网络

安装 ceph相关软件

配置初始 monitor(s)、并生成所有密钥

把配置信息拷贝到各节点

配置 osd

部署 mgr

开启 dashboard 模块,用于UI查看

创建 ceph 块客户端用户名和认证密钥

创建pool

创建副本池

创建ec池

设置profile

创建pool

创建mds ceph文件系统

创建 mds 服务

创建pool

创建osd存存储池

创建用户(可选)

通过内核驱动挂载 Ceph FS

CephFS性能测试

fio

随机读测试

顺序读测试

随机写测试

顺序写测试

混合随机读写

同步i/o(顺序写)测试

异步i/o(顺序写)测试

磁盘性能测试

随机读测试-单块硬盘

rados性能测试

4M写入测试

4k写入测试

4M顺序读

删除CephFS

CRUSH map

查看pool

获取pool参数

查看osd分布

删除poll

集群添加ODS

列出所有磁盘

添加时报错

转换为mbr

再次格式化

rbd数据查看

删除rbd

故障解决

Kubenertes使用ceph集群存储

创建ceph kube 存储池kube 账户的权限

在kube-system namespace中为rbd-provisioner RBAC授权并创建pod

我这为luminous

编辑 ceph.conf 配置文件添加`cluster`与`public`网络

创建ceph kube 存储池`kube` 账户的权限