ceph集群
ceph集群创建
可使用普通账户创建ceph集群
export username="ceph-admin"
export passwd="ceph-admin"
export node1="node1"
export node2="node2"
export node3="node3"
export node1_ip="192.168.122.101"
export node2_ip="192.168.122.102"
export node3_ip="192.168.122.103"
创建部署用户和ssh免密码登录
useradd ${username}
echo "${passwd}" | passwd --stdin ${username}
echo "${username} ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/${username}
chmod 0440 /etc/sudoers.d/${username}
sudo mkdir /etc/ceph
sudo chown -R ceph-admin.ceph-admin /etc/ceph
安装 ceph-deploy升级pip
sudo yum install -y python-pip
pip install --upgrade pip
pip install ceph-deploy
部署节点
创建工作目录,在部署节点时会产生很多信息
mkdir my-cluster
cd my-cluster
ceph-deploy new $node1 $node2 $node3
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
...
[node2][INFO ] Running command: /usr/sbin/ip addr show
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
编辑 ceph.conf 配置文件添加cluster
与public
网络
# ls
ceph.conf ceph-deploy-ceph.log ceph.mon.keyring
vim ceph.conf
[global]
fsid = 07ef58d8-3457-4cac-aa45-95166c738c16
mon_initial_members = node1, node2, node3
mon_host = 192.168.122.101,192.168.122.102,192.168.122.103
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 192.168.122.0/24
cluster network = 192.168.122.0/24
安装 ceph相关软件
建议使用镜像源
替代 ceph-deploy install node1 node2
,不过下面的命令需要在每台node上安装
sudo wget -O /etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo
sudo yum install -y ceph ceph-radosgw
配置初始 monitor(s)、并生成所有密钥
ceph-deploy mon create-initial
ls -l *.keyring
-rw------- 1 root root 71 3月 12 12:53 ceph.bootstrap-mds.keyring
-rw------- 1 root root 71 3月 12 12:53 ceph.bootstrap-mgr.keyring
-rw------- 1 root root 71 3月 12 12:53 ceph.bootstrap-osd.keyring
-rw------- 1 root root 71 3月 12 12:53 ceph.bootstrap-rgw.keyring
-rw------- 1 root root 63 3月 12 12:53 ceph.client.admin.keyring
-rw------- 1 root root 73 3月 12 12:50 ceph.mon.keyring
把配置信息拷贝到各节点
ceph-deploy admin $node1 $node2 $node3
配置 osd
for node in node{1..3};do ceph-deploy disk zap $node /dev/vdc;done
for node in node{1..3};do ceph-deploy osd create $node --data /dev/vdc;done
部署 mgr
ceph-deploy mgr create node{1..3}
开启 dashboard 模块,用于UI查看
sudo ceph mgr module enable dashboard
curl http://localhost:7000
创建 ceph 块客户端用户名和认证密钥
sudo ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd'|tee ./ceph.client.rbd.keyring
- 把密钥文件拷贝到客户端
for node in node{1..3};do scp ceph.client.rbd.keyring /etc/ceph/ceph.conf $node:/etc/ceph/;done
创建pool
通常在创建pool之前,需要覆盖默认的pg_num
,官方推荐:
- 若少于5个OSD, 设置pg_num为128。
- 5~10个OSD,设置pg_num为512。
- 10~50个OSD,设置pg_num为4096。
- 超过50个OSD,可以参考pgcalc计算。
PG和PGP数量一定要根据OSD的数量进行调整,计算公式如下,但是最后算出的结果一定要接近或者等于一个2的指数。
Total PGs = (Total_number_of_OSD * 100) / max_replication_count
修改ceph.conf
文件
[ceph-admin@v31 my-cluster]$ cat ceph.conf
[global]
fsid = 61b3125d-1a74-4901-997e-2cb4625367ab
mon_initial_members = v31, v32, v33
mon_host = 192.168.4.31,192.168.4.32,192.168.4.33
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default pg num = 1024
osd pool default pgp num = 1024
[ceph-admin@v31 my-cluster]$ ceph-deploy --overwrite-conf config push v31 v32 v33
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf config push v31 v32 v33
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : True
[ceph_deploy.cli][INFO ] subcommand : push
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f89fc4c9128>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] client : ['v31', 'v32', 'v33']
[ceph_deploy.cli][INFO ] func : <function config at 0x7f89fc6f7c08>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.config][DEBUG ] Pushing config to v31
[v31][DEBUG ] connection detected need for sudo
[v31][DEBUG ] connected to host: v31
[v31][DEBUG ] detect platform information from remote host
[v31][DEBUG ] detect machine type
[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to v32
[v32][DEBUG ] connection detected need for sudo
[v32][DEBUG ] connected to host: v32
[v32][DEBUG ] detect platform information from remote host
[v32][DEBUG ] detect machine type
[v32][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to v33
[v33][DEBUG ] connection detected need for sudo
[v33][DEBUG ] connected to host: v33
[v33][DEBUG ] detect platform information from remote host
[v33][DEBUG ] detect machine type
[v33][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
- 请不要直接修改某个节点的
/etc/ceph/ceph.conf
文件,而是在部署机下修改ceph.conf, 采用推送的方式更加方便安全,修改完成之后,使用下面的命名将conf文件推送到各个节点上:ceph-deploy --overwrite-conf config push v31 v32 v33
此时需要修改各个节点的monitor服务:systemctl restart ceph-mon@{hostname}.service
例如15个OSD,副本数为3的情况下,根据公式计算的结果应该为500,最接近512,所以需要设定该pool(volumes)的pg_num和pgp_num都为512.
ceph osd pool set volumes pg_num 1024
ceph osd pool set volumes pgp_num 1024
ceph的pool有两种类型,一种是副本池,一种是ec池,创建时也有所区别
创建副本池
ceph osd pool create testpool 128 128 pool 'testpool' created
创建ec池
设置profile
[root@v31 ~]# ceph osd erasure-code-profile set EC-profile k=3 m=1 ruleset-failure-domain=osd
[root@v31 ~]# ceph osd erasure-code-profile get EC-profile
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=1
plugin=jerasure
technique=reed_sol_van
w=8
创建pool
[root@v31 ~]# ceph osd pool create ecpool 1024 1024 erasure EC-profile
For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
[root@v31 ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
8.17TiB 8.13TiB 36.3GiB 0.43
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
kube 14 1.55GiB 0.06 2.57TiB 612
ecpool 20 0B 0 5.79TiB 0
$ sudo ceph osd pool create pool-name pg_num pgp_num erasure
如:
$ ceph osd pool create ecpool 12 12 erasurepool 'ecpool' created
创建mds ceph文件系统
创建 mds 服务
使用 cephFS 集群中必须有 mds 服务
[ceph-admin@v31 my-cluster]$ ceph-deploy mds create v33
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy mds create v33
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fbc3c5e05f0>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] func : <function mds at 0x7fbc3c82eed8>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] mds : [('v33', 'v33')]
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts v33:v33
[v33][DEBUG ] connection detected need for sudo
[v33][DEBUG ] connected to host: v33
[v33][DEBUG ] detect platform information from remote host
[v33][DEBUG ] detect machine type
[ceph_deploy.mds][INFO ] Distro info: CentOS Linux 7.6.1810 Core
[ceph_deploy.mds][DEBUG ] remote host will use systemd
[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to v33
[v33][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[v33][WARNIN] mds keyring does not exist yet, creating one
[v33][DEBUG ] create a keyring file
[v33][DEBUG ] create path if it doesn't exist
[v33][INFO ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.v33 osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-v33/keyring
[v33][INFO ] Running command: sudo systemctl enable ceph-mds@v33
[v33][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/ceph-mds@v33.service to /usr/lib/systemd/system/ceph-mds@.service.
[v33][INFO ] Running command: sudo systemctl start ceph-mds@v33
[v33][INFO ] Running command: sudo systemctl enable ceph.target
创建pool
[ceph-admin@v31 my-cluster]$ ceph osd pool create cluster_data_metadata 1024 1024
For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
[ceph-admin@v31 my-cluster]$ ceph osd pool create cluster_data 1024 1024
For better initial performance on pools expected to store a large number of objects, consider supplying the expected_num_objects parameter when creating the pool.
[ceph-admin@v31 my-cluster]$ ceph fs new cephfs cluster_data_metadata cluster_data
new fs with metadata pool 11 and data pool 12
[ceph-admin@v31 my-cluster]$ ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
8.17TiB 8.14TiB 30.7GiB 0.37
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
cluster_data_metadata 11 0B 0 2.58TiB 0
cluster_data 12 0B 0 2.58TiB 0
[ceph-admin@v31 my-cluster]$ ceph mds stat
cephfs-0/0/1 up
[ceph-admin@v31 my-cluster]$ ceph osd pool ls
cluster_data_metadata
cluster_data
[ceph-admin@v31 my-cluster]$ ceph fs ls
name: cephfs, metadata pool: cluster_data_metadata, data pools: [cluster_data ]
[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_metadata 1024 1024 replicated_rule 1
pool 'cluster_data_metadata' created
[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 1024 1024 replicated_rule 1
Error ERANGE: pg_num 1024 size 3 would mean 9216 total pgs, which exceeds max 7500 (mon_max_pg_per_osd 250 * num_in_osds 30)
[ceph-admin@v31 ~]$ ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
8.17TiB 8.13TiB 36.4GiB 0.43
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
kube 14 1.57GiB 0.06 2.57TiB 614
cluster_data_metadata 21 0B 0 2.57TiB 0
[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 1024 replicated_rule 1
Error ERANGE: pg_num 1024 size 3 would mean 9216 total pgs, which exceeds max 7500 (mon_max_pg_per_osd 250 * num_in_osds 30)
[ceph-admin@v31 ~]$ ceph osd pool create cluster_data_data 100 replicated_rule 1
pool 'cluster_data_data' created
[ceph-admin@v31 ~]$ ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
8.17TiB 8.13TiB 36.4GiB 0.43
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
kube 14 1.57GiB 0.06 2.57TiB 614
cluster_data_metadata 21 0B 0 2.57TiB 0
cluster_data_data 22 0B 0 2.57TiB 0
创建osd存存储池
ceph osd pool create rbd 50
ceph osd pool create kube 50
# 开启监控
ceph osd pool application enable kube mon
ceph osd pool application enable rbd mon
创建用户(可选)
ceph auth get-or-create client.cephfs mon 'allow r' mds 'allow r, allow rw path=/' osd 'allow rw pool=cephfs_data' -o ceph.client.cephfs.keyring
scp ceph.client.cephfs.keyring <node>:/etc/ceph/
对应ceph服务器上获取client-key
ceph auth get-key client.cephfs
这里可以直接使用admin账户的keyring
cat ceph.client.admin.keyring
[client.admin]
key = AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==
通过内核驱动挂载 Ceph FS
安装 ceph-fuse
yum install ceph-fuse -y
确认kernel 加载 ceph 模块
lsmod | grep ceph
ceph 358802 0
libceph 306625 1 ceph
dns_resolver 13140 2 nfsv4,libceph
libcrc32c 12644 4 ip_vs,libceph,nf_nat,nf_conntrack
创建挂载目录
mkdir -p /data
挂载
[ceph-admin@v31 my-cluster]$ sudo mount -t ceph v31:6789:/ /data -o name=admin,secret=AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==
[ceph-admin@v31 my-cluster]$ df -Th |grep ceph
192.168.4.31:6789:/ ceph 2.6T 0 2.6T 0% /data
写入/etc/fstab
[ceph-admin@v31 my-cluster]$ cd /etc/ceph/
[ceph-admin@v31 ceph]$ cp ceph.client.admin.keyring cephfs.key
[ceph-admin@v31 ceph]$ vim cephfs.key
AQBcs4hcKPiaChAAk12oiD79FpIjeGo1PmXFXw==
echo "v31:6789:/ /data ceph name=admin,secretfile=/etc/ceph/cephfs.key,noatime,_netdev 0 0 " >>/etc/fstab
CephFS性能测试
fio
随机读测试
[root@v31 ~]# fio -filename=/mnt/data/test1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=10G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
mytest: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 10 threads
mytest: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [r(1),_(1),r(2),_(1),r(5)][99.8%][r=160MiB/s,w=0KiB/s][r=10.2k,w=0 IOPS][eta 00m:02s]
mytest: (groupid=0, jobs=10): err= 0: pid=3824106: Tue Mar 26 09:13:04 2019
read: IOPS=7359, BW=115MiB/s (121MB/s)(100GiB/890546msec)
clat (usec): min=155, max=215229, avg=1355.08, stdev=1870.48
lat (usec): min=155, max=215229, avg=1355.40, stdev=1870.48
clat percentiles (usec):
| 1.00th=[ 200], 5.00th=[ 217], 10.00th=[ 231], 20.00th=[ 265],
| 30.00th=[ 486], 40.00th=[ 578], 50.00th=[ 660], 60.00th=[ 799],
| 70.00th=[ 1037], 80.00th=[ 1893], 90.00th=[ 3982], 95.00th=[ 5080],
| 99.00th=[ 7701], 99.50th=[ 9110], 99.90th=[15664], 99.95th=[19530],
| 99.99th=[28705]
bw ( KiB/s): min= 3040, max=28610, per=10.01%, avg=11782.72, stdev=3610.33, samples=17792
iops : min= 190, max= 1788, avg=736.38, stdev=225.63, samples=17792
lat (usec) : 250=16.50%, 500=14.76%, 750=25.96%, 1000=11.70%
lat (msec) : 2=11.60%, 4=9.52%, 10=9.62%, 20=0.30%, 50=0.04%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=0.39%, sys=1.82%, ctx=6694389, majf=0, minf=5367
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=6553600,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=115MiB/s (121MB/s), 115MiB/s-115MiB/s (121MB/s-121MB/s), io=100GiB (107GB), run=890546-890546msec
顺序读测试
[root@v33 ~]# fio -filename=/mnt/data/test2 -direct=1 -iodepth 1 -thread -rw=read -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest
mytest: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 30 threads
mytest: Laying out IO file (1 file / 10240MiB)
Jobs: 30 (f=30): [R(30)][100.0%][r=138MiB/s,w=0KiB/s][r=8812,w=0 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=30): err= 0: pid=411789: Tue Mar 26 09:33:03 2019
read: IOPS=10.0k, BW=156MiB/s (164MB/s)(153GiB/1000005msec)
clat (usec): min=141, max=38416, avg=2992.85, stdev=2478.50
lat (usec): min=141, max=38416, avg=2993.14, stdev=2478.52
clat percentiles (usec):
| 1.00th=[ 161], 5.00th=[ 174], 10.00th=[ 188], 20.00th=[ 260],
| 30.00th=[ 652], 40.00th=[ 1467], 50.00th=[ 2999], 60.00th=[ 3949],
| 70.00th=[ 4490], 80.00th=[ 5342], 90.00th=[ 6325], 95.00th=[ 7111],
| 99.00th=[ 8848], 99.50th=[ 9503], 99.90th=[10814], 99.95th=[11731],
| 99.99th=[18482]
bw ( KiB/s): min= 1472, max=47743, per=3.34%, avg=5349.53, stdev=4848.75, samples=60000
iops : min= 92, max= 2983, avg=334.11, stdev=303.03, samples=60000
lat (usec) : 250=19.25%, 500=7.26%, 750=5.25%, 1000=3.04%
lat (msec) : 2=9.00%, 4=17.07%, 10=38.87%, 20=0.26%, 50=0.01%
cpu : usr=0.17%, sys=1.04%, ctx=14529895, majf=0, minf=3600
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=10015571,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=156MiB/s (164MB/s), 156MiB/s-156MiB/s (164MB/s-164MB/s), io=153GiB (164GB), run=1000005-1000005msec
随机写测试
[root@v31 ~]# fio -filename=/mnt/data/test3 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=4k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest_4k_10G_randwrite
mytest_4k_10G_randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.1
Starting 30 threads
mytest_4k_10G_randwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 30 (f=30): [w(30)][100.0%][r=0KiB/s,w=11.8MiB/s][r=0,w=3009 IOPS][eta 00m:00s]
mytest_4k_10G_randwrite: (groupid=0, jobs=30): err= 0: pid=3852817: Tue Mar 26 09:59:25 2019
write: IOPS=3107, BW=12.1MiB/s (12.7MB/s)(11.9GiB/1000067msec)
clat (usec): min=922, max=230751, avg=9651.32, stdev=16589.93
lat (usec): min=923, max=230751, avg=9651.74, stdev=16589.93
clat percentiles (usec):
| 1.00th=[ 1188], 5.00th=[ 1319], 10.00th=[ 1418], 20.00th=[ 1565],
| 30.00th=[ 1745], 40.00th=[ 1991], 50.00th=[ 2343], 60.00th=[ 3097],
| 70.00th=[ 6325], 80.00th=[ 11994], 90.00th=[ 30278], 95.00th=[ 46924],
| 99.00th=[ 79168], 99.50th=[ 91751], 99.90th=[121111], 99.95th=[130548],
| 99.99th=[158335]
bw ( KiB/s): min= 112, max= 1162, per=3.34%, avg=414.50, stdev=92.93, samples=60000
iops : min= 28, max= 290, avg=103.60, stdev=23.22, samples=60000
lat (usec) : 1000=0.01%
lat (msec) : 2=40.50%, 4=23.80%, 10=13.13%, 20=7.97%, 50=10.27%
lat (msec) : 100=4.00%, 250=0.32%
cpu : usr=0.06%, sys=0.30%, ctx=3110768, majf=0, minf=141484
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,3107281,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=12.1MiB/s (12.7MB/s), 12.1MiB/s-12.1MiB/s (12.7MB/s-12.7MB/s), io=11.9GiB (12.7GB), run=1000067-1000067msec
顺序写测试
[root@v33 ~]# fio -filename=/mnt/data/test4 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=1000 -group_reporting -name=mytest
mytest: (g=0): rw=write, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 30 threads
mytest: Laying out IO file (1 file / 10240MiB)
Jobs: 30 (f=30): [W(30)][100.0%][r=0KiB/s,w=50.3MiB/s][r=0,w=3219 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=30): err= 0: pid=454215: Tue Mar 26 10:19:27 2019
write: IOPS=3322, BW=51.9MiB/s (54.4MB/s)(50.7GiB/1000007msec)
clat (usec): min=1130, max=121544, avg=9026.88, stdev=2132.29
lat (usec): min=1131, max=121545, avg=9027.49, stdev=2132.30
clat percentiles (usec):
| 1.00th=[ 4047], 5.00th=[ 6325], 10.00th=[ 7308], 20.00th=[ 7963],
| 30.00th=[ 8291], 40.00th=[ 8586], 50.00th=[ 8848], 60.00th=[ 9110],
| 70.00th=[ 9503], 80.00th=[10028], 90.00th=[10814], 95.00th=[11600],
| 99.00th=[17171], 99.50th=[20317], 99.90th=[25035], 99.95th=[26608],
| 99.99th=[44303]
bw ( KiB/s): min= 896, max= 3712, per=3.34%, avg=1772.81, stdev=213.20, samples=60000
iops : min= 56, max= 232, avg=110.76, stdev=13.32, samples=60000
lat (msec) : 2=0.08%, 4=0.88%, 10=79.28%, 20=19.23%, 50=0.53%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=0.06%, sys=0.55%, ctx=3581559, majf=0, minf=4243
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,3322270,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=51.9MiB/s (54.4MB/s), 51.9MiB/s-51.9MiB/s (54.4MB/s-54.4MB/s), io=50.7GiB (54.4GB), run=1000007-1000007msec
混合随机读写
fio -filename=/data/test5 -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=10G -numjobs=30 -runtime=100 -group_reporting -name=mytest -ioscheduler=noop
同步i/o(顺序写)测试
[root@v31 data]# fio -filename=/mnt/data/test6 -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=4k -size=50G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
mytest: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.1
Starting 10 threads
mytest: Laying out IO file (1 file / 51200MiB)
Jobs: 10 (f=10): [W(10)][100.0%][r=0KiB/s,w=25.6MiB/s][r=0,w=6549 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=3883680: Tue Mar 26 10:48:08 2019
write: IOPS=6180, BW=24.1MiB/s (25.3MB/s)(23.6GiB/1000001msec)
clat (usec): min=825, max=176948, avg=1615.44, stdev=989.83
lat (usec): min=826, max=176949, avg=1615.81, stdev=989.83
clat percentiles (usec):
| 1.00th=[ 1020], 5.00th=[ 1106], 10.00th=[ 1188], 20.00th=[ 1303],
| 30.00th=[ 1369], 40.00th=[ 1434], 50.00th=[ 1500], 60.00th=[ 1565],
| 70.00th=[ 1647], 80.00th=[ 1778], 90.00th=[ 2024], 95.00th=[ 2245],
| 99.00th=[ 2933], 99.50th=[ 4817], 99.90th=[18744], 99.95th=[19268],
| 99.99th=[21890]
bw ( KiB/s): min= 1280, max= 3920, per=10.00%, avg=2473.24, stdev=365.21, samples=19998
iops : min= 320, max= 980, avg=618.27, stdev=91.30, samples=19998
lat (usec) : 1000=0.63%
lat (msec) : 2=88.90%, 4=9.90%, 10=0.26%, 20=0.30%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=0.27%, sys=1.59%, ctx=6286666, majf=0, minf=1148
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,6180315,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=24.1MiB/s (25.3MB/s), 24.1MiB/s-24.1MiB/s (25.3MB/s-25.3MB/s), io=23.6GiB (25.3GB), run=1000001-1000001msec
异步i/o(顺序写)测试
fio -filename=/data/test7 -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=4k -size=50G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
磁盘性能测试
为了对比Ceph文件的性能,此处做了一个单块磁盘的性能测试,为了确保测试的真实性,单块磁盘就选择为一个OSD对应的磁盘。
随机读测试-单块硬盘
fio -filename=/var/lib/ceph/osd/ceph-4/disktest/dlw1 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=10G -numjobs=10 -runtime=1000 -group_reporting -name=mytest
rados性能测试
4M写入测试
rados bench -p cluster_data_data 60 write -t 32 --no-cleanup
Total time run: 60.717291
Total writes made: 2238
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 147.437
Stddev Bandwidth: 20.1603
Max bandwidth (MB/sec): 168
Min bandwidth (MB/sec): 48
Average IOPS: 36
Stddev IOPS: 5
Max IOPS: 42
Min IOPS: 12
Average Latency(s): 0.865663
Stddev Latency(s): 0.40126
Max latency(s): 3.58639
Min latency(s): 0.185036
4k写入测试
rados bench -p cluster_data_data 60 write -t 32 -b 4096 --no-cleanup
Total time run: 60.035923
Total writes made: 201042
Write size: 4096
Object size: 4096
Bandwidth (MB/sec): 13.0808
Stddev Bandwidth: 1.10742
Max bandwidth (MB/sec): 17.1133
Min bandwidth (MB/sec): 9.71875
Average IOPS: 3348
Stddev IOPS: 283
Max IOPS: 4381
Min IOPS: 2488
Average Latency(s): 0.00955468
Stddev Latency(s): 0.0164307
Max latency(s): 0.335681
Min latency(s): 0.00105769
4M顺序读
rados bench -p cluster_data_data 60 seq -t 32 --no-cleanup
Total time run: 22.129977
Total reads made: 201042
Read size: 4096
Object size: 4096
Bandwidth (MB/sec): 35.4867
Average IOPS: 9084
Stddev IOPS: 1278
Max IOPS: 14011
Min IOPS: 7578
Average Latency(s): 0.0035112
Max latency(s): 0.181241
Min latency(s): 0.000287577
删除CephFS
[root@v32 ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
8.17TiB 7.91TiB 262GiB 3.13
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
cluster_data_metadata 11 231MiB 0 2.49TiB 51638
cluster_data 12 65.2GiB 2.49 2.49TiB 317473
[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
Error EBUSY: pool 'cluster_data_metadata' is in use by CephFS
[root@v32 ~]# ceph fs ls
name: cephfs, metadata pool: cluster_data_metadata, data pools: [cluster_data ]
[root@v32 ~]# ceph fs rm cephfs --yes-i-really-mean-it
Error EINVAL: all MDS daemons must be inactive before removing filesystem
[root@v33 ~]# systemctl stop ceph-mds@v33.service
[root@v33 ~]# systemctl desable ceph-mds@v33.service
Unknown operation 'desable'.
[root@v33 ~]# systemctl disable ceph-mds@v33.service
Removed symlink /etc/systemd/system/ceph-mds.target.wants/ceph-mds@v33.service.
[root@v32 ~]# ceph fs rm cephfs --yes-i-really-mean-it
[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@v32 ~]# cat /etc/ceph/ceph.conf
[global]
...
[mon]
mon allow pool delete = true
[root@v32 ~]# ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
pool 'cluster_data_metadata' removed
[root@v32 ~]# ceph osd pool delete cluster_data cluster_data --yes-i-really-really-mean-it
pool 'cluster_data' removed
CRUSH map
1、提取已有的CRUSH map ,使用-o参数,ceph将输出一个经过编译的CRUSH map 到您指定的文件
` ceph osd getcrushmap -o crushmap.txt`
2、反编译你的CRUSH map ,使用-d参数将反编译CRUSH map 到通过-o 指定的文件中
`crushtool -d crushmap.txt -o crushmap-decompile`
3、使用编辑器编辑CRUSH map
`vi crushmap-decompile`
4、重新编译这个新的CRUSH map
`crushtool -c crushmap-decompile -o crushmap-compiled`
5、将新的CRUSH map 应用到ceph 集群中
`ceph osd setcrushmap -i crushmap-compiled`
参考https://blog.csdn.net/heivy/article/details/50592244
查看pool
列出所有的poll
[ceph-admin@v31 my-cluster]$ ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
8.17TiB 8.14TiB 30.5GiB 0.37
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
cluster_data_metadata 2 0B 0 2.58TiB 0
[ceph-admin@v31 my-cluster]$ rados lspools
cluster_data_metadata
删除cluster_data_metadata
pool
查看pool的详细配置信息
[ceph-admin@v31 my-cluster]$ ceph osd pool ls detail
pool 2 'cluster_data_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 146 flags hashpspool stripe_width 0
[ceph-admin@v31 my-cluster]$ ceph osd dump|grep pool
pool 2 'cluster_data_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 146 flags hashpspool stripe_width 0
查看每个pool的空间使用及IO情况
[root@v32 ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
kube 36B 4 0 12 0 0 0 5538 34.1MiB 142769 10.4GiB
total_objects 4
total_used 31.8GiB
total_avail 8.14TiB
total_space 8.17TiB
获取pool参数
查看osd分布
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 8.16879 root default
-3 2.72293 host v31
0 hdd 0.27229 osd.0 up 1.00000 1.00000
1 hdd 0.27229 osd.1 up 1.00000 1.00000
2 hdd 0.27229 osd.2 up 1.00000 1.00000
3 hdd 0.27229 osd.3 up 1.00000 1.00000
4 hdd 0.27229 osd.4 up 1.00000 1.00000
5 hdd 0.27229 osd.5 up 1.00000 1.00000
6 hdd 0.27229 osd.6 up 1.00000 1.00000
7 hdd 0.27229 osd.7 up 1.00000 1.00000
24 hdd 0.27229 osd.24 up 1.00000 1.00000
25 hdd 0.27229 osd.25 up 1.00000 1.00000
-5 2.72293 host v32
8 hdd 0.27229 osd.8 up 1.00000 1.00000
9 hdd 0.27229 osd.9 up 1.00000 1.00000
10 hdd 0.27229 osd.10 up 1.00000 1.00000
11 hdd 0.27229 osd.11 up 1.00000 1.00000
12 hdd 0.27229 osd.12 up 1.00000 1.00000
13 hdd 0.27229 osd.13 up 1.00000 1.00000
14 hdd 0.27229 osd.14 up 1.00000 1.00000
15 hdd 0.27229 osd.15 up 1.00000 1.00000
27 hdd 0.27229 osd.27 up 1.00000 1.00000
29 hdd 0.27229 osd.29 up 1.00000 1.00000
-7 2.72293 host v33
16 hdd 0.27229 osd.16 up 1.00000 1.00000
17 hdd 0.27229 osd.17 up 1.00000 1.00000
18 hdd 0.27229 osd.18 up 1.00000 1.00000
19 hdd 0.27229 osd.19 up 1.00000 1.00000
20 hdd 0.27229 osd.20 up 1.00000 1.00000
21 hdd 0.27229 osd.21 up 1.00000 1.00000
22 hdd 0.27229 osd.22 up 1.00000 1.00000
23 hdd 0.27229 osd.23 up 1.00000 1.00000
26 hdd 0.27229 osd.26 up 1.00000 1.00000
28 hdd 0.27229 osd.28 up 1.00000 1.00000
删除poll
sudo ceph osd pool delete {pool-name} {pool-name} --yes-i-really-really-mean-it
sudo ceph osd pool delete cluster_data_metadata cluster_data_metadata --yes-i-really-really-mean-it
如果删除pool时提示error请参考: 删除pool error的解决方法
集群添加ODS
[ceph-admin@v31 my-cluster]$ ceph -s
cluster:
id: ffdda80f-a48a-431a-a71b-525e5f1965d9
health: HEALTH_OK
services:
mon: 3 daemons, quorum v31,v32,v33
mgr: v31(active), standbys: v32, v33
osd: 24 osds: 24 up, 24 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0B
usage: 24.3GiB used, 6.51TiB / 6.54TiB avail
pgs:
- 补充知识:osd状态
up:守护进程运行中,能够提供IO服务;
down:守护进程不在运行,无法提供IO服务;
in:包含数据;
out:不包含数据
列出所有磁盘
[root@v33 ~]# sudo ceph-disk list
/dev/dm-0 other, ext4, mounted on /
/dev/dm-1 other, swap
/dev/dm-2 other, unknown
/dev/dm-3 other, unknown
/dev/dm-4 other, unknown
/dev/dm-5 other, unknown
/dev/dm-6 other, unknown
/dev/dm-7 other, unknown
/dev/dm-8 other, unknown
/dev/dm-9 other, unknown
/dev/sda :
/dev/sda1 other, vfat, mounted on /boot/efi
/dev/sda2 other, xfs, mounted on /boot
/dev/sda3 other, LVM2_member
/dev/sdb other, unknown
/dev/sdc other, unknown
/dev/sdd other, LVM2_member
/dev/sde other, LVM2_member
/dev/sdf other, LVM2_member
/dev/sdg other, LVM2_member
/dev/sdh other, LVM2_member
/dev/sdi other, LVM2_member
/dev/sdj other, LVM2_member
/dev/sdk other, LVM2_member
添加时报错
[ceph-admin@v31 my-cluster]$ ceph-deploy osd create v31 --data /dev/sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy osd create v31 --data /dev/sdb
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] bluestore : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fe69d002830>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] block_wal : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] journal : None
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] host : v31
[ceph_deploy.cli][INFO ] filestore : None
[ceph_deploy.cli][INFO ] func : <function osd at 0x7fe69d2478c0>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.cli][INFO ] data : /dev/sdb
[ceph_deploy.cli][INFO ] block_db : None
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
[v31][DEBUG ] connection detected need for sudo
[v31][DEBUG ] connected to host: v31
[v31][DEBUG ] detect platform information from remote host
[v31][DEBUG ] detect machine type
[v31][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to v31
[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
[ceph-admin@v31 my-cluster]$ ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph-admin/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] bluestore : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7ff6abd72830>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] block_wal : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] journal : None
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] host : v31
[ceph_deploy.cli][INFO ] filestore : None
[ceph_deploy.cli][INFO ] func : <function osd at 0x7ff6abfb78c0>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.cli][INFO ] data : /dev/sdb
[ceph_deploy.cli][INFO ] block_db : None
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] overwrite_conf : True
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
[v31][DEBUG ] connection detected need for sudo
[v31][DEBUG ] connected to host: v31
[v31][DEBUG ] detect platform information from remote host
[v31][DEBUG ] detect machine type
[v31][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to v31
[v31][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[v31][DEBUG ] find the location of an executable
[v31][INFO ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
[v31][WARNIN] usage: ceph-volume lvm create [-h] --data DATA [--filestore]
[v31][WARNIN] [--journal JOURNAL] [--bluestore]
[v31][WARNIN] [--block.db BLOCK_DB] [--block.wal BLOCK_WAL]
[v31][WARNIN] [--osd-id OSD_ID] [--osd-fsid OSD_FSID]
[v31][WARNIN] [--cluster-fsid CLUSTER_FSID]
[v31][WARNIN] [--crush-device-class CRUSH_DEVICE_CLASS]
[v31][WARNIN] [--dmcrypt] [--no-systemd]
[v31][WARNIN] ceph-volume lvm create: error: GPT headers found, they must be removed on: /dev/sdb
[v31][ERROR ] RuntimeError: command returned non-zero exit status: 2
[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
[ceph-admin@v31 my-cluster]$ /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
--> Falling back to /tmp/ for logging. Can't use /var/log/ceph/ceph-volume.log
--> [Errno 13] Permission denied: '/var/log/ceph/ceph-volume.log'
stderr: error: /dev/sdb: Permission denied
--> SuperUserError: This command needs to be executed with sudo or as root
[ceph-admin@v31 my-cluster]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
usage: ceph-volume lvm create [-h] --data DATA [--filestore]
[--journal JOURNAL] [--bluestore]
[--block.db BLOCK_DB] [--block.wal BLOCK_WAL]
[--osd-id OSD_ID] [--osd-fsid OSD_FSID]
[--cluster-fsid CLUSTER_FSID]
[--crush-device-class CRUSH_DEVICE_CLASS]
[--dmcrypt] [--no-systemd]
ceph-volume lvm create: error: GPT headers found, they must be removed on: /dev/sdb
转换为mbr
[ceph-admin@v31 my-cluster]$ sudo parted -s /dev/sdb mklabel msdos
再次格式化
[root@v32 ~]# /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdc
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new d47d7861-9a83-4879-847d-693e3aa794b6
Running command: vgcreate --force --yes ceph-fad7bf25-dd60-4eff-a932-970c376af00b /dev/sdc
stdout: Wiping dos signature on /dev/sdc.
stdout: Physical volume "/dev/sdc" successfully created.
stdout: Volume group "ceph-fad7bf25-dd60-4eff-a932-970c376af00b" successfully created
Running command: lvcreate --yes -l 100%FREE -n osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 ceph-fad7bf25-dd60-4eff-a932-970c376af00b
stdout: Logical volume "osd-block-d47d7861-9a83-4879-847d-693e3aa794b6" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-27
Running command: restorecon /var/lib/ceph/osd/ceph-27
Running command: chown -h ceph:ceph /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6
Running command: chown -R ceph:ceph /dev/dm-10
Running command: ln -s /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 /var/lib/ceph/osd/ceph-27/block
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-27/activate.monmap
stderr: got monmap epoch 2
Running command: ceph-authtool /var/lib/ceph/osd/ceph-27/keyring --create-keyring --name osd.27 --add-key AQB4yohcMW8eLhAAkdQmhIavIcF+FcPjkKooSQ==
stdout: creating /var/lib/ceph/osd/ceph-27/keyring
stdout: added entity osd.27 auth auth(auid = 18446744073709551615 key=AQB4yohcMW8eLhAAkdQmhIavIcF+FcPjkKooSQ== with 0 caps)
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27/keyring
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 27 --monmap /var/lib/ceph/osd/ceph-27/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-27/ --osd-uuid d47d7861-9a83-4879-847d-693e3aa794b6 --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: /dev/sdc
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 --path /var/lib/ceph/osd/ceph-27
Running command: ln -snf /dev/ceph-fad7bf25-dd60-4eff-a932-970c376af00b/osd-block-d47d7861-9a83-4879-847d-693e3aa794b6 /var/lib/ceph/osd/ceph-27/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-27/block
Running command: chown -R ceph:ceph /dev/dm-10
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-27
Running command: systemctl enable ceph-volume@lvm-27-d47d7861-9a83-4879-847d-693e3aa794b6
stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-27-d47d7861-9a83-4879-847d-693e3aa794b6.service to /usr/lib/systemd/system/ceph-volume@.service.
Running command: systemctl enable --runtime ceph-osd@27
stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@27.service to /usr/lib/systemd/system/ceph-osd@.service.
Running command: systemctl start ceph-osd@27
--> ceph-volume lvm activate successful for osd ID: 27
--> ceph-volume lvm create successful for: /dev/sdc
[root@v32 ~]# /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdd
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new fade79d7-8bee-49c6-85f8-d6c141e6bd4e
Running command: vgcreate --force --yes ceph-fc851010-f2c6-43f7-9c12-843d3a023a65 /dev/sdd
stderr: Physical volume '/dev/sdd' is already in volume group 'ceph-00af8489-3599-427d-bae6-de1e61c4c38a'
Unable to add physical volume '/dev/sdd' to volume group 'ceph-00af8489-3599-427d-bae6-de1e61c4c38a'
/dev/sdd: physical volume not initialized.
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.29 --yes-i-really-mean-it
stderr: purged osd.29
--> RuntimeError: command returned non-zero exit status: 5
ceph-deploy --overwrite-conf osd create v31 --data /dev/sdb
rbd数据查看
[root@v32 ~]# rados ls -p kube
rbd_id.kubernetes-dynamic-pvc-28e6ad1e-4675-11e9-8901-6602f4085af9
rbd_directory
rbd_info
rbd_header.149046b8b4567
删除rbd
[root@v32 ~]# rados -p kube rm rbd_id.kubernetes-dynamic-pvc-28e6ad1e-4675-11e9-8901-6602f4085af9 rbd_directory rbd_info rbd_header.149046b8b4567
故障解决
Kubenertes使用ceph集群存储
https://akomljen.com/using-existing-ceph-cluster-for-kubernetes-persistent-storage/
创建ceph kube 存储池kube
账户的权限
ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'
[client.kube]
key = AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==
在kube-system namespace中为rbd-provisioner RBAC授权并创建pod
vim rbd-provisioner.yaml
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
- apiGroups: [""]
resources: ["services"]
resourceNames: ["kube-dns","coredns"]
verbs: ["list", "get"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
roleRef:
kind: ClusterRole
name: rbd-provisioner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: rbd-provisioner
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rbd-provisioner
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rbd-provisioner
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rbd-provisioner
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
image: ivano/rbd-provisioner
env:
- name: PROVISIONER_NAME
value: ceph.com/rbd
serviceAccount: rbd-provisioner
kubectl -n kube-system apply -f rbd-provisioner.yaml
- 创建
rbd-provisioner
pod时要注意使用的容器镜像中的ceph版本 ``` ceph -v ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)我这为luminous
docker history ivano/rbd-provisioner:latest|grep CEPH_VERSION
<a name="bfa7d08b"></a>
## `rbd-provisioner` ceph存储集群授权配置
RBD卷配置器需要Ceph的管理密钥来配置存储
ceph —cluster ceph auth get-key client.admin AQBDO4dcUktZLxAAwByPxao2QROhQpoWYAWsGg==
<a name="05bf04db"></a>
## 添加Ceph集群admin账户权限
使用上面的Ceph admin账户的密钥创建secret
kubectl create secret generic ceph-secret \ —type=”kubernetes.io/rbd” \ —from-literal=key=’AQBDO4dcUktZLxAAwByPxao2QROhQpoWYAWsGg==’ \ —namespace=kube-system
<a name="66d1a225"></a>
## 创建ceph 存储池
sudo ceph —cluster ceph osd pool create kube 1024 1024 sudo ceph —cluster ceph auth get-or-create client.kube mon ‘allow r’ osd ‘allow rwx pool=kube’ sudo ceph —cluster ceph auth get-key client.kube
<a name="af0b934a"></a>
## 添加Ceph集群kube账户权限
ceph —cluster ceph auth get-key client.kube AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==
kubectl create secret generic ceph-secret-kube \ —type=”kubernetes.io/rbd” \ —from-literal=key=’AQBHTIdcSqnbMhAATB4BJJXyA8K5NFQlE815/A==’ \ —namespace=kube-system
<a name="85166765"></a>
### 查看secret资源
kubectl get secrets -n kube-system |grep ceph ceph-secret kubernetes.io/rbd 1 54m ceph-secret-kube kubernetes.io/rbd 1 51m
<a name="b1f3ed87"></a>
## 创建`storageClassName` 并绑定ceph集群节点
后续pod调用直接使用`storageClassName`调用
vim fast-rbd.yaml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-rbd provisioner: ceph.com/rbd parameters: monitors: 192.168.122.101:6789, 192.168.122.102:6789, 192.168.122.103:6789 adminId: admin adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kube userId: kube userSecretName: ceph-secret-kube userSecretNamespace: kube-system imageFormat: “2” imageFeatures: layering
kubectl create -f fast-rbd.yaml
<a name="1a63ac23"></a>
## 示例
<a name="e28198f4"></a>
### 创建pvc请求
cat <<EOF | kubectl create -f - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myclaim spec: accessModes:
- ReadWriteOnce
resources: requests: storage: 8Gi storageClassName: fast-rbd EOF
查看是否bond
kubectl get pvc myclaim NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE myclaim Bound pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO fast-rbd 52m
<a name="6b232ac3"></a>
### 创建pod示例
cat test-pod.yaml apiVersion: v1 kind: Pod metadata: name: ceph-pod1 spec: containers:
- name: ceph-busybox
image: busybox
command: [“sleep”, “60000”]
volumeMounts:
- name: ceph-vol1 mountPath: /usr/share/busybox readOnly: false volumes:
- name: ceph-vol1 persistentVolumeClaim: claimName: ceph-claim
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: ceph-claim spec: accessModes:
- ReadWriteOnce
resources: requests: storage: 2Gi storageClassName: fast-rbd
检查pv和pvc的创建状态,是否都已经创建;
kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-278c2462-448d-11e9-b632-525400804e1e 8Gi RWO Delete Terminating jx/myclaim fast-rbd 129m pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO Delete Bound default/myclaim fast-rbd 66m pvc-d3a5095a-4225-11e9-8d3b-525400d7a6ef 8Gi RWO Delete Bound default/jenkins nfs-dynamic-class 3d5h pvc-ed9c1211-44af-11e9-9d6f-525400d7a6ef 2Gi RWO Delete Bound default/ceph-claim fast-rbd 4m59s pvc-f25b4ce2-44a1-11e9-9d6f-525400d7a6ef 2Gi RWO Delete Terminating kube-system/ceph-claim ceph-rbd 96m
kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ceph-claim Bound pvc-ed9c1211-44af-11e9-9d6f-525400d7a6ef 2Gi RWO fast-rbd 5m2s jenkins Bound pvc-d3a5095a-4225-11e9-8d3b-525400d7a6ef 8Gi RWO nfs-dynamic-class 3d5h myclaim Bound pvc-5ef52eea-44a7-11e9-9d6f-525400d7a6ef 8Gi RWO fast-rbd 66m
在ceph服务器上,检查rbd镜像创建情况和镜像的信息;
rbd ls —pool rbd kubernetes-dynamic-pvc-1e569f60-44a3-11e9-8e60-fa9f2d515699
rbd ls —pool kube kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6 kubernetes-dynamic-pvc-6038cc76-44a7-11e9-a834-029380302ed2 kubernetes-dynamic-pvc-84a5d823-449e-11e9-bd3d-46e50dc4cee6 kubernetes-dynamic-pvc-edb72324-44af-11e9-a834-029380302ed2
rbd info kube/kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6 rbd image ‘kubernetes-dynamic-pvc-2e7038b1-449d-11e9-bd3d-46e50dc4cee6’: size 8GiB in 2048 objects order 22 (4MiB objects) block_name_prefix: rbd_data.11136b8b4567 format: 2 features: layering flags: create_timestamp: Tue Mar 12 16:02:30 2019
检查busybox内的文件系统挂载和使用情况,确认能正常工作;
kubectl exec -it ceph-pod1 mount |grep rbd /dev/rbd0 on /usr/share/busybox type ext4 (rw,relatime,stripe=1024,data=ordered)
kubectl exec -it ceph-pod1 df |grep rbd /dev/rbd0 1998672 6144 1976144 0% /usr/share/busybox ```