1、CephFS⽂件存储概述
RBD特点:RBD块存储只能⽤于单个VM或者单个pods使⽤,⽆法提供给多个虚拟机或者多个pods”同时 “使⽤,如果虚拟机或pods有共同访问存储的需求需要使⽤CephFS实现,什么是CephFS呢? NAS⽹络附加存储:多个客户端同时访问
NAS网络附加存储:多个客户端同时访问
- EFS
- NAS
- CFS
CephFS特点:
- POSIX-compliant semantics
- Separates metadata from data
- Dynamic rebalancing
- Subdirectory snapshots
- Configurable striping
- Kernel driver support
- FUSE support
- NFS/CIFS deployable
- Use with Hadoop (replace HDFS)
2、 MDS架构解析

⼤部分⽂件存储均通过元数据服务(metadata)查找元数据信息,通过metadata访问实际的数据, Ceph中通过MDS来提供metadata服务,内置⾼可⽤架构,⽀持部署单Active-Standby⽅式部署,也⽀ 持双主Active⽅式部署,其通过交换⽇志Journal来保障元数据的
⼀致性。3、 MDS和FS部署
[root@master1 ceph]# kubectl apply -f filesystem.yaml
cephfilesystem.ceph.rook.io/myfs created
MDS部署需要创建两个pool:metadata pool和data pool,⽀持使⽤副本或就删码的⽅式部署,同时需 要部署metadata服务,metadata以active-standby的⽅式部署,部署完毕之后会创建两个pods⽤于承 载mds
[root@rook-ceph-tools-54fc95f4f4-mg67d /]# ceph -s
cluster:
id: 2d792034-41f1-4ce2-bdc0-3951bc09cab0
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 33h)
mgr: a(active, since 2d)
mds: 1/1 daemons up, 1 hot standby #⽂件系统名称为myfs,mds 包含1主+1备
osd: 4 osds: 4 up (since 5h), 4 in (since 5h)
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 461 objects, 1.3 GiB
usage: 4.9 GiB used, 1.5 TiB / 1.5 TiB avail
pgs: 97 active+clean
io:
client: 853 B/s rd, 1 op/s rd, 0 op/s wr
CephFS⽂件系统
[root@rook-ceph-tools-54fc95f4f4-mg67d /]# ceph fs ls
name: myfs, metadata pool: myfs-metadata, data pools: [myfs-data0 ]
pool信息
[root@rook-ceph-tools-54fc95f4f4-mg67d /]# ceph osd lspools
1 device_health_metrics
2 replicapool
3 myfs-metadata
4 myfs-data0
4、 MDS⾼可⽤性
MDS⽀持双主的⽅式部署,即单个⽂件系统有多组metadata服务器,每组均有主备active-standby的⽅ 式部署
activeCount: 2 #修改为2,表示双主
创建完毕之后,会⾃动创建额外两个pods
rook-ceph-mds-myfs-a-74d49d66fd-wtj7j 1/1 Running 0 19h
rook-ceph-mds-myfs-b-6d4f7fb5c4-kkrhj 1/1 Running 0 19h
rook-ceph-mds-myfs-c-7c998c8f8-s77tz 1/1 Running 0 72s
rook-ceph-mds-myfs-d-66c7b868b-rrz8z 1/1 Running 0 70s
查看ceph的状态如下
[root@rook-ceph-tools-54fc95f4f4-mg67d /]# ceph -s
cluster:
id: 2d792034-41f1-4ce2-bdc0-3951bc09cab0
health: HEALTH_WARN
clock skew detected on mon.e
services:
mon: 4 daemons, quorum a,b,c,e (age 6m)
mgr: a(active, since 2d)
mds: 2/2 daemons up, 2 hot standby
osd: 5 osds: 5 up (since 4m), 5 in (since 4m)
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 480 objects, 1.3 GiB
usage: 5.8 GiB used, 1.7 TiB / 1.7 TiB avail
pgs: 97 active+clean
io:
client: 2.1 KiB/s rd, 3 op/s rd, 0 op/s wr
5、 MDS⾼级调度
mds通过placement提供了调度机制,⽀持节点调度,pods亲和⼒调度,pods反亲和调度,节点容忍和 拓扑调度等,先看下节点的调度,通过ceph-mds=enabled标签选择具备满⾜条件的node节点
nodeAffinity: #node亲和性调度
requiredDuringSchedulingIgnoredDuringExecution: #硬策略
nodeSelectorTerms:
- matchExpressions:
- key: ceph-mds
operator: In
values:
- enabled
nodeAffinity节点亲和⼒调度算法只能将pods运⾏⾄特定的节点上,如果pods都落在同个节点上,其⾼ 可⽤性如何保障呢?需要借助pods反亲和调度算法来实现
podAntiAffinity: #
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- rook-ceph-mds
podAntiAffinity: #pods反亲和⼒调度
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- rook-ceph-mds
可以看到已经满足需求
[root@master1 ceph]# kubectl get pods -n rook-ceph -o wide | grep myfs
rook-ceph-mds-myfs-a-7dfbcb66c4-qxkjl 1/1 Running 0 9m37s 10.244.3.126 node2
rook-ceph-mds-myfs-b-58dbcdb858-52xg5 1/1 Running 0 7m35s 10.244.1.110 master2
rook-ceph-mds-myfs-c-5c95f76bf4-cdm4d 1/1 Running 0 6m56s 10.244.4.12 node3
rook-ceph-mds-myfs-d-89854ddfd-5d4g7 1/1 Running 0 4m4s 10.244.2.90 node1
6、部署CephFS存储类
Rook默认将CephFS相关的存储驱动已安装好,只需要通过storageclass消费即可
[root@master1 cephfs]# cat storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com # driver:namespace:operator
parameters:
clusterID: rook-ceph # namespace:cluster
fsName: myfs
pool: myfs-data0
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # namespace:cluster
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
kubectl apply -f storageclass.yaml创建之后可以查看到 rook-ceph.cephfs.csi.ceph.com的storageclass
[root@master1 cephfs]# kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
course-nfs-storage fuseim.pri/ifs Delete Immediate false 30h
rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 2d5h
rook-cephfs rook-ceph.cephfs.csi.ceph.com Delete Immediate true 3m25s
7、容器调⽤CephFS
As an example
[root@master1 cephfs]# cat kube-registry.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cephfs-pvc
namespace: kube-system
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: rook-cephfs
—-
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-registry
namespace: kube-system
labels:
k8s-app: kube-registry
kubernetes.io/cluster-service: “true”
spec:
replicas: 3
selector:
matchLabels:
k8s-app: kube-registry
template:
metadata:
labels:
k8s-app: kube-registry
kubernetes.io/cluster-service: “true”
spec:
containers:
- name: registry
image: registry:2
imagePullPolicy: Always
resources:
limits:
cpu: 100m
memory: 100Mi
env:
# Configuration reference: https://docs.docker.com/registry/configuration/
- name: REGISTRY_HTTP_ADDR
value: :5000
- name: REGISTRY_HTTP_SECRET
value: “Ple4seCh4ngeThisN0tAVerySecretV4lue”
- name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
value: /var/lib/registry
volumeMounts:
- name: image-store
mountPath: /var/lib/registry
ports:
- containerPort: 5000
name: registry
protocol: TCP
livenessProbe:
httpGet:
path: /
port: registry
readinessProbe:
httpGet:
path: /
port: registry
volumes:
- name: image-store
persistentVolumeClaim:
claimName: cephfs-pvc
readOnly: false
对接成功后会⾃动创建PV和PVC,PVC位于kube-system命名空间下
[root@master1 cephfs]# kubectl get pvc -n kube-system
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
cephfs-pvc Bound pvc-442c546c-e893-4401-9d6f-5b168b3e92f5 1Gi RWX rook-cephfs 103s
[root@master1 cephfs]# kubectl get pv pvc-442c546c-e893-4401-9d6f-5b168b3e92f5
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-442c546c-e893-4401-9d6f-5b168b3e92f5 1Gi RWX Delete Bound kube-system/cephfs-pvc rook-cephfs 2m12s
[root@master1 cephfs]#
8、 镜像仓库功能验证
kube-registry部署了三个pods,需要通过service将其暴露给外部服务使⽤
[root@master1 cephfs]# kubectl get pods -l k8s-app=kube-registry -n kube-system
NAME READY STATUS RESTARTS AGE
kube-registry-66d4c7bf47-bsjz6 1/1 Running 0 18m
kube-registry-66d4c7bf47-ns5g5 1/1 Running 0 18m
kube-registry-66d4c7bf47-xn5s4 1/1 Running 0 18m
kubectl expose -n kube-system deployment kube-registry — port=5000
[root@master1 cephfs]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10
kube-registry ClusterIP 10.110.33.213
修改docker的配置,将私有镜像仓库加⼊insecure-registries,默认如果不是https的话会报错,因此需要 将其加⼊ [root@master1 cephfs]# cat /etc/docker/daemon.json
{
“exec-opts”: [“native.cgroupdriver=systemd”],
“log-driver”: “json-file”,
“log-opts”: {
“max-size”: “100m”
},
“storage-driver”: “overlay2”,
“insecure-registries”: [ “10.110.33.213:5000” ]
}
加⼊后执⾏systemctl restart docker重启⼀下服务,然后执⾏功能验证
[root@node-1 cephfs]# docker image push 10.110.33.213:5000/rook/ceph:v1.5.5
