• 可以使用各种公有云进行按量付费实验。
  • 本次搭建的是单 master 的集群,不适合用于生产环境。
  • 所有节点的机器需要联网。

第一章:主机规划

角色 IP地址 hostname 操作系统 配置 设备
Master 192.168.65.100 k8s-master CentOS 7.9,基础设施服务器 4核CPU,8G内存,40G硬盘
Node1 192.168.65.101 k8s-node1 CentOS 7.9,基础设施服务器 8核CPU,16G内存,40G硬盘 /dev/sdb 100G
Node2 192.168.65.102 k8s-node2 CentOS 7.9,基础设施服务器 8核CPU,16G内存,40G硬盘 /dev/sdb 100G
Node3 192.168.65.103 k8s-node3 CentOS 7.9,基础设施服务器 8核CPU,16G内存,40G硬盘 /dev/sdb 100G

第二章:安装 Kubernetes

2.1 Kubernetes 和 Docker 的版本对应关系

1.PNG

从文档中,我们可以知道 Docker 的版本是 v20.10 ,对应的 Kubernetes 的版本是 v1.21 。

2.2 前置条件

  • 如果是虚拟机则需要让所有机器互通,最简单的做法就是关闭防火墙。
  1. systemctl stop firewalld
  1. systemctl disable firewalld

2.gif

2.3 准备工作

2.3.1 升级系统内核

  • 查看当前系统的版本:
  1. cat /etc/redhat-release

3.gif

  • 查看当前系统的内核:
  1. uname -sr

默认的 3.10.0 实在是太低了,需要升级内核。

4.gif

  • 在 CentOS 7.x 上启用 ELRepo 仓库:
  1. rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
  1. rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm

5.gif

  • 查看可用的系统内核相关包(选做):
  1. yum --disablerepo="*" --enablerepo="elrepo-kernel" list available

6.gif

  • 安装最新主线内核版本:
  1. yum -y --enablerepo=elrepo-kernel install kernel-ml

7.gif

  • 设置默认的内核版本:
  1. vim /etc/default/grub
  1. GRUB_TIMEOUT=5
  2. GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
  3. GRUB_DEFAULT=0 # 修改此处,原来是 saved
  4. GRUB_DISABLE_SUBMENU=true
  5. GRUB_TERMINAL_OUTPUT="console"
  6. GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet"
  7. GRUB_DISABLE_RECOVERY="true"

8.gif

  • 重新创建内核配置:
  1. grub2-mkconfig -o /boot/grub2/grub.cfg

9.gif

  • 重启系统:
  1. reboot

10.gif

  • 查看当前系统的内核:
  1. uname -sr

11.gif

2.3.2 设置主机名

  • 命令:
  1. hostnamectl set-hostname <hostname>
  • 示例:
  1. # 192.168.65.100
  2. hostnamectl set-hostname k8s-master
  1. # 192.168.65.101
  2. hostnamectl set-hostname k8s-node1
  1. # 192.168.65.102
  2. hostnamectl set-hostname k8s-node2
  1. # 192.168.65.103
  2. hostnamectl set-hostname k8s-node3

12.gif

2.3.3 主机名解析

  • 为了方便后面集群节点间的直接调用,需要配置一下主机名解析,企业中推荐使用内部的 DNS 服务器。
  1. cat >> /etc/hosts << EOF
  2. 127.0.0.1 $(hostname)
  3. 192.168.65.100 k8s-master
  4. 192.168.65.101 k8s-node1
  5. 192.168.65.102 k8s-node2
  6. 192.168.65.103 k8s-node3
  7. EOF

13.gif

2.3.4 时间同步

  • Kubernetes 要求集群中的节点时间必须精确一致,所以在每个节点上添加时间同步:
  1. yum install ntpdate -y
  1. ntpdate time.windows.com

14.gif

2.3.5 关闭 SELinux

  • 查看 SELinux 是否开启:
  1. getenforce
  • 永久关闭 SELinux ,需要重启:
  1. sed -i 's/enforcing/disabled/' /etc/selinux/config
  • 关闭当前会话的 SELinux ,重启之后无效:
  1. setenforce 0

15.gif

2.3.6 关闭 swap 分区

  • 永久关闭 swap ,需要重启:
  1. sed -ri 's/.*swap.*/#&/' /etc/fstab
  • 关闭当前会话的 swap ,重启之后无效:
  1. swapoff -a

16.gif

2.3.7 将桥接的 IPv4 流量传递到 iptables 的链

  • 修改 /etc/sysctl.conf 文件:
  1. # 如果有配置,则修改
  2. sed -i "s#^net.ipv4.ip_forward.*#net.ipv4.ip_forward=1#g" /etc/sysctl.conf
  3. sed -i "s#^net.bridge.bridge-nf-call-ip6tables.*#net.bridge.bridge-nf-call-ip6tables=1#g" /etc/sysctl.conf
  4. sed -i "s#^net.bridge.bridge-nf-call-iptables.*#net.bridge.bridge-nf-call-iptables=1#g" /etc/sysctl.conf
  5. sed -i "s#^net.ipv6.conf.all.disable_ipv6.*#net.ipv6.conf.all.disable_ipv6=1#g" /etc/sysctl.conf
  6. sed -i "s#^net.ipv6.conf.default.disable_ipv6.*#net.ipv6.conf.default.disable_ipv6=1#g" /etc/sysctl.conf
  7. sed -i "s#^net.ipv6.conf.lo.disable_ipv6.*#net.ipv6.conf.lo.disable_ipv6=1#g" /etc/sysctl.conf
  8. sed -i "s#^net.ipv6.conf.all.forwarding.*#net.ipv6.conf.all.forwarding=1#g" /etc/sysctl.conf
  1. # 可能没有,追加
  2. echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
  3. echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.conf
  4. echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf
  5. echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf
  6. echo "net.ipv6.conf.default.disable_ipv6 = 1" >> /etc/sysctl.conf
  7. echo "net.ipv6.conf.lo.disable_ipv6 = 1" >> /etc/sysctl.conf
  8. echo "net.ipv6.conf.all.forwarding = 1" >> /etc/sysctl.conf
  • 加载 br_netfilter 模块:
  1. modprobe br_netfilter
  • 持久化修改(保留配置包本地文件,重启系统或服务进程仍然有效):
  1. sysctl -p

17.gif

2.3.8 开启 ipvs

  • 在 Kubernetes 中 service 有两种代理模型,一种是基于 iptables ,另一种是基于 ipvs 的。
  • ipvs 的性能要高于 iptables 的,但是如果要使用它,需要手动载入 ipvs 模块。
  • 在所有机器安装 ipset 和 ipvsadm :
  1. yum -y install ipset ipvsadm
  • 在所有机器执行如下脚本:
  1. cat > /etc/sysconfig/modules/ipvs.modules <<EOF
  2. #!/bin/bash
  3. modprobe -- ip_vs
  4. modprobe -- ip_vs_rr
  5. modprobe -- ip_vs_wrr
  6. modprobe -- ip_vs_sh
  7. modprobe -- nf_conntrack
  8. EOF
  • 授权、运行、检查是否加载:
  1. chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

18.gif

2.3.9 重启

  • 所有机器重启:
  1. reboot

20.gif

2.4 Docker 安装

所有机器都需要安装 Docker 。

  • 卸载旧版本:
  1. yum remove docker \
  2. docker-client \
  3. docker-client-latest \
  4. docker-common \
  5. docker-latest \
  6. docker-latest-logrotate \
  7. docker-logrotate \
  8. docker-engine

21.gif

  • yum 安装 gcc 相关:
  1. yum -y install gcc
  1. yum -y install gcc-c++

22.gif

  • 安装所需要的软件包:
  1. yum -y install yum-utils

23.gif

  • 设置 stable 镜像仓库:
  1. yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

24.gif

  • 更新 yum 软件包索引:
  1. yum makecache fast

25.gif

  • 查看存储库中 Docker 的版本(选做):
  1. yum list docker-ce --showduplicates | sort -r

26.gif

  • 安装指定版本的 Docker(v20.10) :
  1. yum -y install docker-ce-3:20.10.8-3.el7.x86_64 docker-ce-cli-3:20.10.8-3.el7.x86_64 containerd.io

27.gif

  • 启动 Docker :
  1. # 启动 Docker
  2. systemctl start docker
  1. # 开启自动启动
  2. systemctl enable docker

28.gif

  • 验证 Docker 是否安装成功:
  1. docker version

29.gif

  • 阿里云镜像加速:
  1. sudo mkdir -p /etc/docker
  1. sudo tee /etc/docker/daemon.json <<-'EOF'
  2. {
  3. "exec-opts": ["native.cgroupdriver=systemd"],
  4. "registry-mirrors": ["https://du3ia00u.mirror.aliyuncs.com"],
  5. "live-restore": true,
  6. "log-driver":"json-file",
  7. "log-opts": {"max-size":"500m", "max-file":"3"},
  8. "max-concurrent-downloads": 10,
  9. "max-concurrent-uploads": 5,
  10. "storage-driver": "overlay2"
  11. }
  12. EOF
  1. sudo systemctl daemon-reload
  1. sudo systemctl restart docker

30.gif

2.5 添加阿里云的 Kubernetes 的 YUM 源

  • 由于 Kubernetes 的镜像源在国外,非常慢,这里切换成国内的阿里云镜像源(所有机器均需执行下面命令):
  1. cat > /etc/yum.repos.d/kubernetes.repo << EOF
  2. [kubernetes]
  3. name=Kubernetes
  4. baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
  5. enabled=1
  6. gpgcheck=0
  7. repo_gpgcheck=0
  8. gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
  9. EOF

31.gif

2.6 安装 kubelet 、kubeadm 和 kubectl

  • Kubernetes 架构图:

32.PNG

  • 安装 kubelet 、kubeadm 和 kubectl ,所有机器均需执行以下命令:
  1. yum install -y kubelet-1.21.10 kubeadm-1.21.10 kubectl-1.21.10

33.gif

  • 为了实现 Docker 使用的 cgroup drvier 和 kubelet 使用的 cgroup drver 一致,建议修改 /etc/sysconfig/kubelet 文件的内容,所有机器均需执行以下命令:
  1. vim /etc/sysconfig/kubelet
  1. # 修改
  2. KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
  3. KUBE_PROXY_MODE="ipvs"

34.gif

  • 设置为开机自启动即可,由于没有生成配置文件,集群初始化后自动启动:
  1. systemctl enable kubelet

35.gif

2.7 查看 Kubernetes 安装所需镜像(选做)

  • 查看 Kubernetes 安装所需镜像:
  1. kubeadm config images list

36.gif

2.8 下载 Kubernetes 安装所需镜像

  • 所有机器均通过 Docker 下载所需镜像:
  1. docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.21.10
  2. docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.21.10
  3. docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.21.10
  4. docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.21.10
  5. docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.4.1
  6. docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.4.13-0
  7. docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.0

37.gif

  • 给 coredns 镜像重新打 tag :
  1. docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.0 registry.cn-hangzhou.aliyuncs.com/google_containers/coredns/coredns:v1.8.0

38.gif

2.9 部署 Kubernetes 的 Master 节点

  • 在 k8s-master (192.168.65.100)机器上部署 Kubernetes 的 Master 节点:
  1. # 由于默认拉取镜像地址k8s.gcr.io国内无法访问,这里需要指定阿里云镜像仓库地址
  2. kubeadm init \
  3. --apiserver-advertise-address=192.168.65.100 \
  4. --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers \
  5. --kubernetes-version=v1.21.10 \
  6. --service-cidr=10.96.0.0/16 \
  7. --pod-network-cidr=10.244.0.0/16

注意:

  • apiserver-advertise-address 一定要是主机的 IP 地址。
  • apiserver-advertise-address 、service-cidr 和 pod-network-cidr 不能在同一个网络范围内。

39.gif

  • 日志:
  1. Your Kubernetes control-plane has initialized successfully!
  2. To start using your cluster, you need to run the following as a regular user:
  3. mkdir -p $HOME/.kube
  4. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  5. sudo chown $(id -u):$(id -g) $HOME/.kube/config
  6. Alternatively, if you are the root user, you can run:
  7. export KUBECONFIG=/etc/kubernetes/admin.conf
  8. You should now deploy a pod network to the cluster.
  9. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  10. https://kubernetes.io/docs/concepts/cluster-administration/addons/
  11. Then you can join any number of worker nodes by running the following on each as root:
  12. kubeadm join 192.168.65.100:6443 --token 5oqv3n.4n2ak6e1y4h35cra \
  13. --discovery-token-ca-cert-hash sha256:d82d66af9a8b1ef328501eb082235c65627be53918cb910501e088a78c766425
  • 根据日志提示操作,在 k8s-master (192.168.65.100) 执行如下命令:
  1. mkdir -p $HOME/.kube
  2. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  3. sudo chown $(id -u):$(id -g) $HOME/.kube/config
  1. # 如果是 root 用户,还可以执行如下命令
  2. export KUBECONFIG=/etc/kubernetes/admin.conf

40.gif

  • 默认的 token 有效期为 2 小时,当过期之后,该 token 就不能用了,这时可以使用如下的命令创建 token :
  1. kubeadm token create --print-join-command
  1. # 生成一个永不过期的token
  2. kubeadm token create --ttl 0 --print-join-command

2.10 部署 Kubernetes 的 Node节点

  • 根据日志提示操作,在 k8s-node1(192.168.65.101)、k8s-node2(192.168.65.102)和 k8s-node3(192.168.65.103) 执行如下命令:
  1. kubeadm join 192.168.65.100:6443 --token 5oqv3n.4n2ak6e1y4h35cra \
  2. --discovery-token-ca-cert-hash sha256:d82d66af9a8b1ef328501eb082235c65627be53918cb910501e088a78c766425

41.gif

2.11 部署网络插件

  1. kubectl apply -f https://projectcalico.docs.tigera.io/v3.19/manifests/calico.yaml

备注:为什么使用 3.19 , 原因在这里

42.gif

  • 查看部署 CNI 网络插件进度:
  1. watch -n 1 kubectl get pod -n kube-system

43.gif

  1. kubectl get pods -n kube-system

44.PNG

2.12 查看节点状态

  • 在 Master(192.168.65.100)节点上查看节点状态:
  1. kubectl get nodes

45.PNG

2.13 设置 kube-proxy 的 ipvs 模式

  • 在 Master(192.168.65.100)节点设置 kube-proxy 的 ipvs 模式:
  1. kubectl edit cm kube-proxy -n kube-system
  1. apiVersion: v1
  2. data:
  3. config.conf: |-
  4. apiVersion: kubeproxy.config.k8s.io/v1alpha1
  5. bindAddress: 0.0.0.0
  6. bindAddressHardFail: false
  7. clientConnection:
  8. acceptContentTypes: ""
  9. burst: 0
  10. contentType: ""
  11. kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
  12. qps: 0
  13. clusterCIDR: 10.244.0.0/16
  14. configSyncPeriod: 0s
  15. conntrack:
  16. maxPerCore: null
  17. min: null
  18. tcpCloseWaitTimeout: null
  19. tcpEstablishedTimeout: null
  20. detectLocalMode: ""
  21. enableProfiling: false
  22. healthzBindAddress: ""
  23. hostnameOverride: ""
  24. iptables:
  25. masqueradeAll: false
  26. masqueradeBit: null
  27. minSyncPeriod: 0s
  28. syncPeriod: 0s
  29. ipvs:
  30. excludeCIDRs: null
  31. minSyncPeriod: 0s
  32. scheduler: ""
  33. strictARP: false
  34. syncPeriod: 0s
  35. tcpFinTimeout: 0s
  36. tcpTimeout: 0s
  37. udpTimeout: 0s
  38. kind: KubeProxyConfiguration
  39. metricsBindAddress: ""
  40. mode: ""
  41. nodePortAddresses: null
  42. minSyncPeriod: 0s
  43. syncPeriod: 0s
  44. ipvs:
  45. excludeCIDRs: null
  46. minSyncPeriod: 0s
  47. scheduler: ""
  48. strictARP: false
  49. syncPeriod: 0s
  50. tcpFinTimeout: 0s
  51. tcpTimeout: 0s
  52. udpTimeout: 0s
  53. kind: KubeProxyConfiguration
  54. metricsBindAddress: ""
  55. mode: "ipvs" # 修改此处
  56. ...

46.gif

  • 删除 kube-proxy ,让 Kubernetes 集群自动创建新的 kube-proxy :
  1. kubectl delete pod -l k8s-app=kube-proxy -n kube-system

47.gif

第三章:安装 KubeSphere 前置条件

3.1 安装 metrics-server(v 0.6.1)

  • 在 Master(192.168.65.100)节点安装 metrics-server :
  1. vi k8s-metrics.yaml
  1. apiVersion: v1
  2. kind: ServiceAccount
  3. metadata:
  4. labels:
  5. k8s-app: metrics-server
  6. name: metrics-server
  7. namespace: kube-system
  8. ---
  9. apiVersion: rbac.authorization.k8s.io/v1
  10. kind: ClusterRole
  11. metadata:
  12. labels:
  13. k8s-app: metrics-server
  14. rbac.authorization.k8s.io/aggregate-to-admin: "true"
  15. rbac.authorization.k8s.io/aggregate-to-edit: "true"
  16. rbac.authorization.k8s.io/aggregate-to-view: "true"
  17. name: system:aggregated-metrics-reader
  18. rules:
  19. - apiGroups:
  20. - metrics.k8s.io
  21. resources:
  22. - pods
  23. - nodes
  24. verbs:
  25. - get
  26. - list
  27. - watch
  28. ---
  29. apiVersion: rbac.authorization.k8s.io/v1
  30. kind: ClusterRole
  31. metadata:
  32. labels:
  33. k8s-app: metrics-server
  34. name: system:metrics-server
  35. rules:
  36. - apiGroups:
  37. - ""
  38. resources:
  39. - nodes/metrics
  40. verbs:
  41. - get
  42. - apiGroups:
  43. - ""
  44. resources:
  45. - pods
  46. - nodes
  47. verbs:
  48. - get
  49. - list
  50. - watch
  51. ---
  52. apiVersion: rbac.authorization.k8s.io/v1
  53. kind: RoleBinding
  54. metadata:
  55. labels:
  56. k8s-app: metrics-server
  57. name: metrics-server-auth-reader
  58. namespace: kube-system
  59. roleRef:
  60. apiGroup: rbac.authorization.k8s.io
  61. kind: Role
  62. name: extension-apiserver-authentication-reader
  63. subjects:
  64. - kind: ServiceAccount
  65. name: metrics-server
  66. namespace: kube-system
  67. ---
  68. apiVersion: rbac.authorization.k8s.io/v1
  69. kind: ClusterRoleBinding
  70. metadata:
  71. labels:
  72. k8s-app: metrics-server
  73. name: metrics-server:system:auth-delegator
  74. roleRef:
  75. apiGroup: rbac.authorization.k8s.io
  76. kind: ClusterRole
  77. name: system:auth-delegator
  78. subjects:
  79. - kind: ServiceAccount
  80. name: metrics-server
  81. namespace: kube-system
  82. ---
  83. apiVersion: rbac.authorization.k8s.io/v1
  84. kind: ClusterRoleBinding
  85. metadata:
  86. labels:
  87. k8s-app: metrics-server
  88. name: system:metrics-server
  89. roleRef:
  90. apiGroup: rbac.authorization.k8s.io
  91. kind: ClusterRole
  92. name: system:metrics-server
  93. subjects:
  94. - kind: ServiceAccount
  95. name: metrics-server
  96. namespace: kube-system
  97. ---
  98. apiVersion: v1
  99. kind: Service
  100. metadata:
  101. labels:
  102. k8s-app: metrics-server
  103. name: metrics-server
  104. namespace: kube-system
  105. spec:
  106. ports:
  107. - name: https
  108. port: 443
  109. protocol: TCP
  110. targetPort: https
  111. selector:
  112. k8s-app: metrics-server
  113. ---
  114. apiVersion: apps/v1
  115. kind: Deployment
  116. metadata:
  117. labels:
  118. k8s-app: metrics-server
  119. name: metrics-server
  120. namespace: kube-system
  121. spec:
  122. selector:
  123. matchLabels:
  124. k8s-app: metrics-server
  125. strategy:
  126. rollingUpdate:
  127. maxUnavailable: 0
  128. template:
  129. metadata:
  130. labels:
  131. k8s-app: metrics-server
  132. spec:
  133. containers:
  134. - args:
  135. - --cert-dir=/tmp
  136. - --secure-port=4443
  137. - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
  138. - --kubelet-use-node-status-port
  139. - --metric-resolution=15s
  140. - --kubelet-insecure-tls # 使用非安全的协议
  141. image: bitnami/metrics-server:0.6.1 # k8s.gcr.io/metrics-server/metrics-server:v0.6.1
  142. imagePullPolicy: IfNotPresent
  143. livenessProbe:
  144. failureThreshold: 3
  145. httpGet:
  146. path: /livez
  147. port: https
  148. scheme: HTTPS
  149. periodSeconds: 10
  150. name: metrics-server
  151. ports:
  152. - containerPort: 4443
  153. name: https
  154. protocol: TCP
  155. readinessProbe:
  156. failureThreshold: 3
  157. httpGet:
  158. path: /readyz
  159. port: https
  160. scheme: HTTPS
  161. initialDelaySeconds: 20
  162. periodSeconds: 10
  163. resources:
  164. requests:
  165. cpu: 100m
  166. memory: 200Mi
  167. securityContext:
  168. allowPrivilegeEscalation: false
  169. readOnlyRootFilesystem: true
  170. runAsNonRoot: true
  171. runAsUser: 1000
  172. volumeMounts:
  173. - mountPath: /tmp
  174. name: tmp-dir
  175. nodeSelector:
  176. kubernetes.io/os: linux
  177. priorityClassName: system-cluster-critical
  178. serviceAccountName: metrics-server
  179. volumes:
  180. - emptyDir: {}
  181. name: tmp-dir
  182. ---
  183. apiVersion: apiregistration.k8s.io/v1
  184. kind: APIService
  185. metadata:
  186. labels:
  187. k8s-app: metrics-server
  188. name: v1beta1.metrics.k8s.io
  189. spec:
  190. group: metrics.k8s.io
  191. groupPriorityMinimum: 100
  192. insecureSkipTLSVerify: true
  193. service:
  194. name: metrics-server
  195. namespace: kube-system
  196. version: v1beta1
  197. versionPriority: 100
  1. kubectl apply -f k8s-metrics.yaml

49.gif

  • 查看是否安装成功:
  1. kubectl top nodes --use-protocol-buffers
  1. kubectl top pods --use-protocol-buffers

50.gif

3.2 安装 GlusterFS 存储系统

注意:

  • 不建议您在生产环境中使用 NFS 存储(尤其是在 Kubernetes 1.20 或以上版本),这可能会引起 failed to obtain lockinput/output error 等问题,从而导致 Pod CrashLoopBackOff。此外,部分应用不兼容 NFS,例如 Prometheus 等。
  • 开发和测试的时候,可以使用 NFS 存储。

3.2.1 VMware 添加空硬盘

  • 在 VMware 中的 k8s-node1 和 k8s-node2 以及 K8s-node3 节点添加硬盘,本次只以 k8s-node1 为例,其余节点依次类推即可:

51.gif

  • 添加完毕之后,需要重启 k8s-node1 和 k8s-node2 以及 K8s-node3 节点,以便让这些机器能识别到新添加的硬盘:
  1. reboot

52.gif

  • 查看新添加的硬盘:
  1. lsblk -f

53.gif

注意:云厂商需要磁盘清零,假设 lsblk -f 命令查询的结果是 vdc ,那么就需要执行 dd if=/dev/zero of=/dev/vdc bs=1M status=progress 命令。

3.2.2 在所有 Node 节点机器安装 GlusterFS

  • 在所有 Node 机器上配置 yum 源:
  1. yum -y install centos-release-gluster

54.gif

  • 在所有 Node 机器安装 GlusterFS 服务器:
  1. yum -y install glusterfs glusterfs-server glusterfs-fuse

55.gif

  • 验证是否安装成功:
  1. glusterfs -V

56.gif

  • 启动 glusterd 服务,并设置开机自启:
  1. systemctl enable glusterd && systemctl start glusterd

57.gif

  • 查看 glusterd 服务是否启动成功:
  1. systemctl status glusterd

58.gif

3.2.3 创建 GlusterFS 集群

  • 在任意节点添加其他节点,组成 GlusterFS 集群,本次以 k8s-node1 为例:
  1. gluster peer probe k8s-node2
  1. gluster peer probe k8s-node3

59.gif

  • 验证集群中的所有节点均已成功连接:
  1. gluster peer status

60.gif

3.2.4 设置 Node 节点的免密登录

  • 在 k8s-node1 节点执行如下命令:
  1. ssh-keygen
  1. ssh-copy-id root@k8s-node2
  1. ssh-copy-id root@k8s-node3

48.gif

3.2.5 部署 Heketi

由于 GlusterFS 本身不提供 API 调用的方法,因此您可以安装 Heketi,通过用于 Kubernetes 调用的 RESTful API 来管理 GlusterFS 存储卷的生命周期。这样,您的 Kubernetes 集群就可以动态地配置 GlusterFS 存储卷。在此示例中将会安装 Heketi v7.0.0 。

  • 在 k8s-node1 节点下载 Heketi (网络不行,点这里heketi.zip):
  1. wget https://github.com/heketi/heketi/releases/download/v7.0.0/heketi-v7.0.0.linux.amd64.tar.gz

61.gif

  • 在 k8s-node1 节点解压 Heketi :
  1. tar -zxvf heketi-v7.0.0.linux.amd64.tar.gz
  1. cd heketi
  1. cp heketi /usr/bin
  1. cp heketi-cli /usr/bin

62.gif

  • 在 k8s-node1 节点创建 Heketi 服务文件:
  1. vi /lib/systemd/system/heketi.service
  1. [Unit]
  2. Description=Heketi Server
  3. [Service]
  4. Type=simple
  5. WorkingDirectory=/var/lib/heketi
  6. ExecStart=/usr/bin/heketi --config=/etc/heketi/heketi.json
  7. Restart=on-failure
  8. StandardOutput=syslog
  9. StandardError=syslog
  10. [Install]
  11. WantedBy=multi-user.target

63.gif

  • 在 k8s-node1 节点创建 Heketi 文件夹:
  1. mkdir -p /var/lib/heketi
  1. mkdir -p /etc/heketi

64.gif

  • 在 k8s-node1 节点创建 JSON 文件以配置 Heketi :
  1. vi /etc/heketi/heketi.json
  1. {
  2. "_port_comment": "Heketi Server Port Number",
  3. "port": "8080",
  4. "_use_auth": "Enable JWT authorization. Please enable for deployment",
  5. "use_auth": false,
  6. "_jwt": "Private keys for access",
  7. "jwt": {
  8. "_admin": "Admin has access to all APIs",
  9. "admin": {
  10. "key": "123456"
  11. },
  12. "_user": "User only has access to /volumes endpoint",
  13. "user": {
  14. "key": "123456"
  15. }
  16. },
  17. "_glusterfs_comment": "GlusterFS Configuration",
  18. "glusterfs": {
  19. "_executor_comment": [
  20. "Execute plugin. Possible choices: mock, ssh",
  21. "mock: This setting is used for testing and development.",
  22. " It will not send commands to any node.",
  23. "ssh: This setting will notify Heketi to ssh to the nodes.",
  24. " It will need the values in sshexec to be configured.",
  25. "kubernetes: Communicate with GlusterFS containers over",
  26. " Kubernetes exec api."
  27. ],
  28. "executor": "ssh",
  29. "_sshexec_comment": "SSH username and private key file information",
  30. "sshexec": {
  31. "keyfile": "/root/.ssh/id_rsa",
  32. "user": "root"
  33. },
  34. "_kubeexec_comment": "Kubernetes configuration",
  35. "kubeexec": {
  36. "host" :"https://kubernetes.host:8443",
  37. "cert" : "/path/to/crt.file",
  38. "insecure": false,
  39. "user": "kubernetes username",
  40. "password": "password for kubernetes user",
  41. "namespace": "Kubernetes namespace",
  42. "fstab": "Optional: Specify fstab file on node. Default is /etc/fstab"
  43. },
  44. "_db_comment": "Database file name",
  45. "db": "/var/lib/heketi/heketi.db",
  46. "brick_max_size_gb" : 1024,
  47. "brick_min_size_gb" : 1,
  48. "max_bricks_per_volume" : 33,
  49. "_loglevel_comment": [
  50. "Set log level. Choices are:",
  51. " none, critical, error, warning, info, debug",
  52. "Default is warning"
  53. ],
  54. "loglevel" : "debug"
  55. }
  56. }

在安装 GlusterFS 作为 KubeSphere 集群的存储类型时,必须提供帐户 admin 及其 Secret 值。

65.gif

  • 在 k8s-node1 节点启用 Heketi :
  1. systemctl start heketi

66.gif

  • 在 k8s-node1 节点检查 Heketi 的状态:
  1. systemctl status heketi

67.gif

  • 在 k8s-node1 节点设置 Heketi 开机自启:
  1. systemctl enable heketi

68.gif

  • 在 k8s-node1 节点为 Heketi 创建拓扑配置文件,该文件包含添加到 Heketi 的集群、节点和磁盘的信息。
  1. vi /etc/heketi/topology.json
  1. {
  2. "clusters": [
  3. {
  4. "nodes": [
  5. {
  6. "node": {
  7. "hostnames": {
  8. "manage": [
  9. "192.168.65.101"
  10. ],
  11. "storage": [
  12. "192.168.65.101"
  13. ]
  14. },
  15. "zone": 1
  16. },
  17. "devices": [
  18. "/dev/sdb"
  19. ]
  20. },
  21. {
  22. "node": {
  23. "hostnames": {
  24. "manage": [
  25. "192.168.65.102"
  26. ],
  27. "storage": [
  28. "192.168.65.102"
  29. ]
  30. },
  31. "zone": 1
  32. },
  33. "devices": [
  34. "/dev/sdb"
  35. ]
  36. },
  37. {
  38. "node": {
  39. "hostnames": {
  40. "manage": [
  41. "192.168.65.103"
  42. ],
  43. "storage": [
  44. "192.168.65.103"
  45. ]
  46. },
  47. "zone": 1
  48. },
  49. "devices": [
  50. "/dev/sdb"
  51. ]
  52. }
  53. ]
  54. }
  55. ]
  56. }
  • 请使用您自己的 IP 替换上述 IP 地址。
  • 请在 devices 一栏添加您自己的磁盘名称。

69.gif

  • 在 k8s-node1 节点加载 Heketi JSON 文件:
  1. export HEKETI_CLI_SERVER=http://localhost:8080
  1. heketi-cli topology load --json=/etc/heketi/topology.json

70.gif

同时显示了集群 ID 和节点 ID。

  • 在 k8s-node1 节点查看集群信息:
  1. heketi-cli cluster info 3df3bf32cf1dd7c047f46725facd814c # Use your own cluster ID.

71.PNG

3.2.6 Kubernetes 使用 GlusterFS 创建 storageclass

  • 在 k8s-master 节点创建 storageclass :
  1. vi glusterfs-sc.yaml
  1. apiVersion: v1
  2. kind: Secret
  3. metadata:
  4. name: heketi-secret
  5. namespace: kube-system
  6. type: kubernetes.io/glusterfs
  7. data:
  8. key: "MTIzNDU2" # 请替换为您自己的密钥。Base64 编码。 echo -n "123456" | base64
  9. ---
  10. apiVersion: storage.k8s.io/v1
  11. kind: StorageClass
  12. metadata:
  13. annotations:
  14. storageclass.beta.kubernetes.io/is-default-class: "true"
  15. storageclass.kubesphere.io/supported-access-modes: '["ReadWriteOnce","ReadOnlyMany","ReadWriteMany"]'
  16. name: glusterfs
  17. parameters:
  18. clusterid: "3df3bf32cf1dd7c047f46725facd814c" #请替换为您自己的 GlusterFS 集群 ID。
  19. gidMax: "50000"
  20. gidMin: "40000"
  21. restauthenabled: "true"
  22. resturl: "http://192.168.65.101:8080" #Gluster REST 服务/Heketi 服务 URL 可按需供应 gluster 存储卷。请替换为您自己的 URL。
  23. restuser: admin
  24. secretName: heketi-secret
  25. secretNamespace: kube-system
  26. volumetype: "replicate:3" #请替换为您自己的存储卷类型。
  27. provisioner: kubernetes.io/glusterfs
  28. reclaimPolicy: Delete
  29. volumeBindingMode: Immediate
  30. allowVolumeExpansion: true
  1. kubectl apply -f glusterfs-sc.yaml

72.gif

第四章:安装 KuberSphere(v3.2.1)

4.1 下载核心配置文件

  1. wget https://github.com/kubesphere/ks-installer/releases/download/v3.2.1/kubesphere-installer.yaml
  1. wget https://github.com/kubesphere/ks-installer/releases/download/v3.2.1/cluster-configuration.yaml

73.gif

4.2 修改 cluster-configuration

  • 在 k8s-master 节点修改 cluster-configuration.yaml 文件,指定我们需要开启的功能:
  1. vi cluster-configuration.yaml
  1. ---
  2. apiVersion: installer.kubesphere.io/v1alpha1
  3. kind: ClusterConfiguration
  4. metadata:
  5. name: ks-installer
  6. namespace: kubesphere-system
  7. labels:
  8. version: v3.2.1
  9. spec:
  10. persistence:
  11. storageClass: "" # If there is no default StorageClass in your cluster, you need to specify an existing StorageClass here.
  12. authentication:
  13. jwtSecret: "" # Keep the jwtSecret consistent with the Host Cluster. Retrieve the jwtSecret by executing "kubectl -n kubesphere-system get cm kubesphere-config -o yaml | grep -v "apiVersion" | grep jwtSecret" on the Host Cluster.
  14. local_registry: "" # Add your private registry address if it is needed.
  15. # dev_tag: "" # Add your kubesphere image tag you want to install, by default it's same as ks-install release version.
  16. etcd:
  17. monitoring: true # Enable or disable etcd monitoring dashboard installation. You have to create a Secret for etcd before you enable it.
  18. endpointIps: localhost # etcd cluster EndpointIps. It can be a bunch of IPs here.
  19. port: 2379 # etcd port.
  20. tlsEnable: true
  21. common:
  22. core:
  23. console:
  24. enableMultiLogin: true # Enable or disable simultaneous logins. It allows different users to log in with the same account at the same time.
  25. port: 30880
  26. type: NodePort
  27. # apiserver: # Enlarge the apiserver and controller manager's resource requests and limits for the large cluster
  28. # resources: {}
  29. # controllerManager:
  30. # resources: {}
  31. redis:
  32. enabled: true
  33. volumeSize: 2Gi # Redis PVC size.
  34. openldap:
  35. enabled: true
  36. volumeSize: 2Gi # openldap PVC size.
  37. minio:
  38. volumeSize: 20Gi # Minio PVC size.
  39. monitoring:
  40. # type: external # Whether to specify the external prometheus stack, and need to modify the endpoint at the next line.
  41. endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090 # Prometheus endpoint to get metrics data.
  42. GPUMonitoring: # Enable or disable the GPU-related metrics. If you enable this switch but have no GPU resources, Kubesphere will set it to zero.
  43. enabled: false
  44. gpu: # Install GPUKinds. The default GPU kind is nvidia.com/gpu. Other GPU kinds can be added here according to your needs.
  45. kinds:
  46. - resourceName: "nvidia.com/gpu"
  47. resourceType: "GPU"
  48. default: true
  49. es: # Storage backend for logging, events and auditing.
  50. # master:
  51. # volumeSize: 4Gi # The volume size of Elasticsearch master nodes.
  52. # replicas: 1 # The total number of master nodes. Even numbers are not allowed.
  53. # resources: {}
  54. # data:
  55. # volumeSize: 20Gi # The volume size of Elasticsearch data nodes.
  56. # replicas: 1 # The total number of data nodes.
  57. # resources: {}
  58. logMaxAge: 7 # Log retention time in built-in Elasticsearch. It is 7 days by default.
  59. elkPrefix: logstash # The string making up index names. The index name will be formatted as ks-<elk_prefix>-log.
  60. basicAuth:
  61. enabled: false
  62. username: ""
  63. password: ""
  64. externalElasticsearchUrl: ""
  65. externalElasticsearchPort: ""
  66. alerting: # (CPU: 0.1 Core, Memory: 100 MiB) It enables users to customize alerting policies to send messages to receivers in time with different time intervals and alerting levels to choose from.
  67. enabled: true # Enable or disable the KubeSphere Alerting System.
  68. # thanosruler:
  69. # replicas: 1
  70. # resources: {}
  71. auditing: # Provide a security-relevant chronological set of records,recording the sequence of activities happening on the platform, initiated by different tenants.
  72. enabled: true # Enable or disable the KubeSphere Auditing Log System.
  73. # operator:
  74. # resources: {}
  75. # webhook:
  76. # resources: {}
  77. devops: # (CPU: 0.47 Core, Memory: 8.6 G) Provide an out-of-the-box CI/CD system based on Jenkins, and automated workflow tools including Source-to-Image & Binary-to-Image.
  78. enabled: true # Enable or disable the KubeSphere DevOps System.
  79. # resources: {}
  80. jenkinsMemoryLim: 2Gi # Jenkins memory limit.
  81. jenkinsMemoryReq: 1500Mi # Jenkins memory request.
  82. jenkinsVolumeSize: 8Gi # Jenkins volume size.
  83. jenkinsJavaOpts_Xms: 512m # The following three fields are JVM parameters.
  84. jenkinsJavaOpts_Xmx: 512m
  85. jenkinsJavaOpts_MaxRAM: 2g
  86. events: # Provide a graphical web console for Kubernetes Events exporting, filtering and alerting in multi-tenant Kubernetes clusters.
  87. enabled: true # Enable or disable the KubeSphere Events System.
  88. # operator:
  89. # resources: {}
  90. # exporter:
  91. # resources: {}
  92. # ruler:
  93. # enabled: true
  94. # replicas: 2
  95. # resources: {}
  96. logging: # (CPU: 57 m, Memory: 2.76 G) Flexible logging functions are provided for log query, collection and management in a unified console. Additional log collectors can be added, such as Elasticsearch, Kafka and Fluentd.
  97. enabled: true # Enable or disable the KubeSphere Logging System.
  98. containerruntime: docker
  99. logsidecar:
  100. enabled: true
  101. replicas: 2
  102. # resources: {}
  103. metrics_server: # (CPU: 56 m, Memory: 44.35 MiB) It enables HPA (Horizontal Pod Autoscaler).
  104. enabled: false # Enable or disable metrics-server.
  105. monitoring:
  106. storageClass: "" # If there is an independent StorageClass you need for Prometheus, you can specify it here. The default StorageClass is used by default.
  107. # kube_rbac_proxy:
  108. # resources: {}
  109. # kube_state_metrics:
  110. # resources: {}
  111. # prometheus:
  112. # replicas: 1 # Prometheus replicas are responsible for monitoring different segments of data source and providing high availability.
  113. # volumeSize: 20Gi # Prometheus PVC size.
  114. # resources: {}
  115. # operator:
  116. # resources: {}
  117. # adapter:
  118. # resources: {}
  119. # node_exporter:
  120. # resources: {}
  121. # alertmanager:
  122. # replicas: 1 # AlertManager Replicas.
  123. # resources: {}
  124. # notification_manager:
  125. # resources: {}
  126. # operator:
  127. # resources: {}
  128. # proxy:
  129. # resources: {}
  130. gpu: # GPU monitoring-related plug-in installation.
  131. nvidia_dcgm_exporter: # Ensure that gpu resources on your hosts can be used normally, otherwise this plug-in will not work properly.
  132. enabled: false # Check whether the labels on the GPU hosts contain "nvidia.com/gpu.present=true" to ensure that the DCGM pod is scheduled to these nodes.
  133. # resources: {}
  134. multicluster:
  135. clusterRole: none # host | member | none # You can install a solo cluster, or specify it as the Host or Member Cluster.
  136. network:
  137. networkpolicy: # Network policies allow network isolation within the same cluster, which means firewalls can be set up between certain instances (Pods).
  138. # Make sure that the CNI network plugin used by the cluster supports NetworkPolicy. There are a number of CNI network plugins that support NetworkPolicy, including Calico, Cilium, Kube-router, Romana and Weave Net.
  139. enabled: true # Enable or disable network policies.
  140. ippool: # Use Pod IP Pools to manage the Pod network address space. Pods to be created can be assigned IP addresses from a Pod IP Pool.
  141. type: none # Specify "calico" for this field if Calico is used as your CNI plugin. "none" means that Pod IP Pools are disabled.
  142. topology: # Use Service Topology to view Service-to-Service communication based on Weave Scope.
  143. type: none # Specify "weave-scope" for this field to enable Service Topology. "none" means that Service Topology is disabled.
  144. openpitrix: # An App Store that is accessible to all platform tenants. You can use it to manage apps across their entire lifecycle.
  145. store:
  146. enabled: true # Enable or disable the KubeSphere App Store.
  147. servicemesh: # (0.3 Core, 300 MiB) Provide fine-grained traffic management, observability and tracing, and visualized traffic topology.
  148. enabled: true # Base component (pilot). Enable or disable KubeSphere Service Mesh (Istio-based).
  149. kubeedge: # Add edge nodes to your cluster and deploy workloads on edge nodes.
  150. enabled: true # Enable or disable KubeEdge.
  151. cloudCore:
  152. nodeSelector: {"node-role.kubernetes.io/worker": ""}
  153. tolerations: []
  154. cloudhubPort: "10000"
  155. cloudhubQuicPort: "10001"
  156. cloudhubHttpsPort: "10002"
  157. cloudstreamPort: "10003"
  158. tunnelPort: "10004"
  159. cloudHub:
  160. advertiseAddress: # At least a public IP address or an IP address which can be accessed by edge nodes must be provided.
  161. - "" # Note that once KubeEdge is enabled, CloudCore will malfunction if the address is not provided.
  162. nodeLimit: "100"
  163. service:
  164. cloudhubNodePort: "30000"
  165. cloudhubQuicNodePort: "30001"
  166. cloudhubHttpsNodePort: "30002"
  167. cloudstreamNodePort: "30003"
  168. tunnelNodePort: "30004"
  169. edgeWatcher:
  170. nodeSelector: {"node-role.kubernetes.io/worker": ""}
  171. tolerations: []
  172. edgeWatcherAgent:
  173. nodeSelector: {"node-role.kubernetes.io/worker": ""}
  174. tolerations: []

4.3 执行安装

  • 在 k8s-master 节点执行安装:
  1. kubectl apply -f kubesphere-installer.yaml
  1. kubectl apply -f cluster-configuration.yaml

74.gif

  • 在 k8s-master 节点查看安装进度:
  1. kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f

75.gif

4.4 解决 etcd 监控证书找不到的问题

  • 在 k8s-master 节点解决 etcd 监控证书找不到的问题:
  1. kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs --from-file=etcd-client-ca.crt=/etc/kubernetes/pki/etcd/ca.crt --from-file=etcd-client.crt=/etc/kubernetes/pki/apiserver-etcd-client.crt --from-file=etcd-client.key=/etc/kubernetes/pki/apiserver-etcd-client.key