一、docker

1.1 docker概述

1.1.1 docker是什么

Docker 是一个开源的容器引擎,开发者可以打包他们的应用以及依赖包到一个可移植的镜像中,然后发布到任何 Linux 机器上,也可以实现虚拟化。容器是完全使用沙盒机制,相互之间不会有任何接口(类似 iPhone 的 app)。几乎没有性能开销,可以很容易地在机器和数据中心中运行。最重要的是,他们不依赖于任何语言、框架或操作系统。Docker 从 17.03 版本之后分为 CE(Community Edition: 社区版)和 EE(Enterprise Edition: 企业版),我们用社区版就可以

Docker核心技术:
Namespace: 实现Container的进程、网络、消息、文件系统和主机名的隔离
Cgroup: 实现对资源的配合和度量


1.1.2 docker优缺点

优点:
1. 快、小
比虚拟机小,比虚拟机快,管理操作(启动,停止,开始,重启等等) 都是以秒或毫秒为单位。Docker 启动一个容器实例时间很短,一两秒就可以启动一个实例。
2. 敏捷
像虚拟机一样敏捷,而且会更便宜,在 bare metal(裸机)上布署像点个按钮一样简单。
3. 灵活
将应用和系统“容器化”,不添加额外的操作系统,
4. 轻量
你会拥有足够的“操作系统”,仅需添加或减小镜像即可。在一台服务器上可以布署 100~1000 个Containers 容器。
5. 便宜
开源的,免费的,低成本的。由现代 Linux 内核支持并驱动。轻量的 Container 必定可以在一个物
理机上开启更多“容器”,注定比 VMs 要便宜。
6. 对于在笔记本电脑,数据中心的虚拟机,以及任何的云上,运行相同的没有变化的应用程序,IT
的发布速度更快。
7. 开发人员并不关心具体哪个 Linux 操作系统
使用 Docker,开发人员可以根据所有依赖关系构建相应的软件,针对他们所选择的操作系统。
然后,在部署时一切是完全一样的,因为一切都是基于 Docker Image 的运行。
8.Google,微软,亚马逊,IBM 等都支持 Docker。
9.Docker 支持 Unix/Linux 操作系统,也支持 Windows 或 Mac
缺点:
1. Docker 部署应用的时候,并不包含数据。日志,数据库等通常应放在 Docker 容器外。一个容器的镜像通常都很小,不适合存大量数据,存储可以通过外部挂载的方式使用。比如使用:NFS、ceph、glusterfs、ipsan,MFS 等
一句话:docker 只用于计算,存储交给别人。
2. 所有容器共用 linux kernel 资源,资源能否实现最大限度利用,所以在安全上也会存在漏洞


1.2 安装docker并配置镜像加速器

1.2.1 安装前初始化(选择执行)

注: 因为个体机器不一样,可以选择执行下面的命令
关闭selinux

  1. setenforce 0
  2. sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

关闭防火墙

  1. systemctl stop firewalld && systemctl disable firewalld

配置yum源为阿里源

  1. mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
  2. mv /etc/yum.repos.d/epel.repo /etc/yum.repos.d/epel.repo.backup
  3. yum -y install wget yum-utils
  4. yum-config-manager --add-repo https://mirrors.aliyun.com/repo/Centos-7.repo
  5. yum-config-manager --add-repo http://mirrors.aliyun.com/repo/epel-7.repo
  6. yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
  7. yum clean all && yum repolist

安装基础软件包和docker

  1. yum -y install wget net-tools nfs-utils lrzsz gcc gcc-c++ make cmake libxml2-devel openssl-devel curl curl-devel unzip sudo ntp libaio-devel wget vim ncurses-devel autoconf automake zlib-devel python-devel openssh-server socat ipvsadm conntrack
  2. yum -y install docker-ce
  3. systemctl enable docker --now
  4. docker version
  5. systemctl status docker

1.2.2 开启包转发功能和修改内核参数

开启内核模块(临时生效,重启失效)

  1. modprobe br_netfilter

开启内核模块(永久生效,选择执行)

  1. tee /etc/rc.sysinit << 'EOF'
  2. #!/bin/bash
  3. for file in /etc/sysconfig/modules/*.modules; do
  4. [ -x $file ] && $file
  5. done
  6. EOF
  7. tee /etc/sysconfig/modules/br_netfilter.modules << 'EOF'
  8. modprobe br_netfilter
  9. EOF
  10. chmod 755 /etc/sysconfig/modules/br_netfilter.modules

开启转发

  1. tee /etc/sysctl.d/docker.conf << 'EOF'
  2. net.bridge.bridge-nf-call-ip6tables = 1
  3. net.bridge.bridge-nf-call-iptables = 1
  4. net.ipv4.ip_forward = 1
  5. EOF
  6. sysctl -p /etc/sysctl.d/docker.conf

重启docker

  1. systemctl restart docker

1.2.3设置镜像加速并部署应用测试

配置镜像加速

  1. tee /etc/docker/daemon.json << 'EOF'
  2. {
  3. "registry-mirrors":["https://rsbud4vc.mirror.aliyuncs.com","https://registry.dockercn.com","https://docker.mirrors.ustc.edu.cn","https://dockerhub.azk8s.cn","http://hubmirror.c.163.com","http://qtid6917.mirror.aliyuncs.com","https://rncxm540.mirror.aliyuncs.com","https://e9yneuy4.mirror.aliyuncs.com"]
  4. }
  5. EOF
  6. systemctl daemon-reload && systemctl restart docker

拉取镜像测试

  1. docker pull centos
  2. docker images

1.3 通过docker部署应用

下载镜像到本地并查看

  1. docker pull centos
  2. docker images

传镜像包并解压

  1. # docker load -i <包名>
  2. docker load -i /opt/nginx.tar.gz

把镜像包打成离线包

  1. # docker save -o <保存路径/镜像包名> <镜像名>
  2. docker save -o centos.tar.gz centos

查看本地所有镜像并删除镜像

  1. # 修改镜像tag
  2. # docker tag <旧镜像名>:<旧tag> <新镜像名>:<tag>
  3. docker tag centos:latest centos:v1
  4. docker images
  5. # docker rmi -f <镜像名字>
  6. docker rmi -f centos:v1

基于镜像启动容器

  1. # docker run -d -p <本机端口>:<容器端口> <镜像名>
  2. # -d 在后台运行,-p 映射端口
  3. docker run -d -p 80:80 nginx
  4. # 查看所有正在运行的容器
  5. docker ps
  6. netstat -anutp | grep 80

访问端口测试
image.png
查看容器所有信息

  1. #docker inspect <容器id|容器的名字>
  2. docker inspect 794e04ecd9c9

停止容器,并查看

  1. # 停止容器
  2. # docker stop <容器id|容器的名字>
  3. docker stop 794e04ecd9c9
  4. # 查看所有,包括停止的容器
  5. docker ps -a
  6. # 查看容器日志
  7. # docker logs <容器id|容器的名字>
  8. docker logs 794e04ecd9c9
  9. # 启动已经停止的容器
  10. # docker start <容器id|容器的名字>

1.4 基于dockerfile构建镜像

Dockerfile语法介绍
FROM:基础镜像,必须是可以下载下来的镜像,以什么镜像为基础

MAINTAINER: 指定镜像的作者信息

RUN: 指定在当前镜像构建过程中运行的命令,包含两种模式
#shell 模式
RUN
RUN yum install wget -y
#exec 模式
RUN [“executable”,”param1”,”param2”]
RUN [“/bin/bash”,”-c”,”echo hello”]

EXPOSE:用来声明端口的
帮助镜像使用者理解这个镜像服务的守护端口,以方便配置映射。在运行时使用随机端口映射时,也就是 docker run -P ,会自动随机映射 EXPOSE 的端口。
格式: EXPOSE <端口 1> [<端口 2>…]

CMD:为启动的容器指定默认要运行的程序,程序运行结束,容器也就结束。CMD 指令指定的程
序可被 docker run 命令行参数中指定要运行的程序所覆盖。如果 Dockerfile 中如果存在多个
CMD 指令,仅最后一个生效。
exec 模式: CMD[“executable”,”param1”,”param2”]
shell 模式 : CMD command param1 param2

ENTRYPOINT: 类似于 CMD 指令,但其不会被 docker run 的命令行参数指定的指令所覆盖,而且这些命令行参数会被当作参数送给 ENTRYPOINT 指令指定的程序。如果 Dockerfile 中如果存在多个 ENTRYPOINT 指令,仅最后一个生效。如果运行 docker run 时使用了 —entrypoint选项,将覆盖 CMD 指令指定的程序。如果 Dockerfile 中如果存在多个 ENTRYPOINT 指令,仅最后一个生效。
格式: ENTRYPOINT [““,”“,”“,…]
可以搭配 CMD 命令使用:一般是变参才会使用 CMD ,这里的 CMD 等于是在给 ENTRYPOINT 传参
示例: 假设已通过 Dockerfile 构建了 nginx:test 镜像:

  1. FROM nginx
  2. ENTRYPOINT ["nginx", "-c"] # 定参
  3. CMD ["/etc/nginx/nginx.conf"] # 变参

构建的镜像是 nginx:test
1、不传参运行
docker run nginx:test
容器内会默认运行以下命令,启动主进程。
nginx -c /etc/nginx/nginx.conf
2、传参运行
docker run nginx:test -c /etc/nginx/new.conf
容器内会默认运行以下命令,启动主进程(/etc/nginx/new.conf:假设容器内已有此文件)
nginx -c /etc/nginx/new.conf

COPY: 复制指令,从目录中复制文件或者目录到容器里指定路径。
COPY [—chown=:] <源路径 1>… <目标路径>
COPY [—chown=:] [“<源路径 1>”,… “<目标路径>”]
[—chown=:]:可选参数,用户改变复制到容器内文件的拥有者和属组。
<源路径>:源文件或者源目录,这里可以是通配符表达式,其通配符规则要满足 Go 的
filepath.Match 规则。

ADD:指令和 COPY 的使用格式一致(同样需求下,官方推荐使用 COPY)。功能也类似,不同之
处如下:
ADD 的优点:在执行 <源文件> 为 tar 压缩文件的话,压缩格式为 gzip, bzip2 以及 xz 的情
况下,会自动复制并解压到 <目标路径>。
ADD 的缺点:在不解压的前提下,无法复制 tar 压缩文件。会令镜像构建缓存失效,从而可能会
令镜像构建变得比较缓慢。具体是否使用,可以根据是否需要自动解压来决定

ENV:设置环境变量,定义了环境变量,那么在后续的指令中,就可以使用这个环境变量。
格式:
ENV
ENV = =
以下示例设置 NODE_VERSION = 7.2.0 , 在后续的指令中可以通过 $NODE_VERSION 引用: ENV NODE_VERSION 7.2.0

ARG :构建参数,与 ENV 作用一至。不过作用域不一样。ARG 设置的环境变量仅对
Dockerfile 内有效,也就是说只有 docker build 的过程中有效,构建好的镜像内不存在此环境变量。
构建命令 docker build 中可以用 —build-arg <参数名>=<值> 来覆盖。
格式:
ARG <参数名>[=<默认值>]

VOLUME:定义匿名数据卷。在启动容器时忘记挂载数据卷,会自动挂载到匿名卷。
避免重要的数据,因容器重启而丢失,这是非常致命的。 避免容器不断变大。
格式:
VOLUME [“<路径 1>”, “<路径 2>”…]
VOLUME <路径>
在启动容器 docker run 的时候,我们可以通过 -v 参数修改挂载点

HEALTHCHECK:用于指定某个程序或者指令来监控docker容器服务的运行状态。
格式:
HEALTHCHECK [选项] CMD <命令>:设置检查容器健康状况的命令
HEALTHCHECK NONE:如果基础镜像有健康检查指令,使用这行可以屏蔽掉其健康检查指令
HEALTHCHECK [选项] CMD <命令> : 这边 CMD 后面跟随的命令使用,可以参考 CMD 的用法
编写一个简单的dockerfile文件

  1. mkdir dockerfile && cd dockerfile
  2. tee Dockerfile << 'EOF'
  3. FROM centos
  4. MAINTAINER yin
  5. RUN rm -rf /etc/yum.repos.d/*
  6. COPY Centos-vault-8.5.2111.repo /etc/yum.repos.d/
  7. RUN yum install wget -y
  8. RUN yum install nginx -y
  9. COPY index.html /usr/share/nginx/html/
  10. EXPOSE 80
  11. ENTRYPOINT ["/usr/sbin/nginx","-g","daemon off;"]
  12. EOF

编写首页内容, 并导入repo文件

  1. tee index.html << 'EOF'
  2. <html>
  3. <head>
  4. <title>page added to dockerfile</title>
  5. </head>
  6. <body>
  7. <h1>welcome to docker, I am yin </h1>
  8. </body>
  9. </html>
  10. EOF

基于Dockerfile来构建镜像并启动

  1. # docker build -t <镜像名>:<tag> <Dockerfile路径>
  2. docker build -t yin/nginx:v1 .
  3. docker images
  4. # --name html2 给容器起个名字,-p 80 容器端口,物理机端口会随机映射
  5. docker run -d -p 80 --name html2 yin/nginx:v1
  6. docker ps | grep html

访问测试
image.png


1.5 docker 容器资源配额控制之 cpu

Docker 通过 cgroup 来控制容器使用的资源配额,资源配额包括 CPU、内存、磁盘 IO,以免当多个容器运行时,防止某容器把所有的硬件都占用了。(比如一台被黑的容器,有可能把所有的资源都占用)
cgroup :
cgroup 是 Control Groups 的缩写,是 Linux 内核提供的一种可以限制、记录、隔离进程组所使
用的物理资源(如 cpu、memory、磁盘 IO 等等) ,被 LXC、docker 等很多项目用于实现进程资
源控制。
LXC:
LXC 是 Linux Container 的简写。可以提供轻量级的虚拟化,以便隔离进程和资源。


1.5.1 指定 docker 容器可以使用的 cpu 份额

cpu配额的参数: -c, —cpu-shares int
CPU shares (relative weight) 在创建容器时指定容器所使用的 CPU 份额值。cpu-shares 的值不能保证可以获得 1 个 vcpu 或者多少 GHz 的 CPU 资源,仅仅只是一个弹性的加权值。 默认每个 docker 容器的 cpu 份额值都是 1024。在同一个 CPU 核心上,同时运行多个容器时,容器的 cpu 加权的效果才能体现出来。
例: 两个容器 A、B 的 cpu 份额分别为 1000 和 500,结果会怎么样?
情况 1:A 和 B 正常运行,占用同一个 CPU,在 cpu 进行时间片分配的时候,容器 A 比容器 B 多一倍的机会获得 CPU 的时间片。
情况 2:分配的结果取决于当时其他容器的运行状态。比如容器 A 的进程一直是空闲的,那么容器 B是可以获取比容器 A 更多的 CPU 时间片的; 比如主机上只运行了一个容器,即使它的 cpu 份额只 有 50,它也可以独占整个主机的 cpu 资源。cgroups 只在多个容器同时争抢同一个 cpu 资源时,cpu 配额才会生效。因此,无法单纯根据某个容器的 cpu 份额来确定有多少 cpu 资源分配给它,资源分配结果取决于同时运行的其他容器的 cpu 分配和容器中进程运行情况。

运行一个容器并分配CPU使用份额, 并查看参数

  1. docker run -it -c 512 centos cat /sys/fs/cgroup/cpu/cpu.shares

总结
通过-c 设置的 cpu share 并不是 CPU 资源的绝对数量,而是一个相对的权重值。某个容器最终能分配到的 CPU 资源取决于它的 cpu share 占所有容器 cpu share 总和的比例。通过 cpu share 可以设置容器使用 CPU 的优先级。
比如在 host 中启动了两个容器:
docker run —name “containerA” -c 1024 ubuntu
docker run —name “containerB” -c 512 ubuntu
containerA 的 cpu share 1024,是 containerB 的两倍。当两个容器都需要 CPU 资源时,containerA 可以得到的 CPU 是 container_B 的两倍。需要注意的是,这种按权重分配 CPU 只会发生在 CPU 资源紧张的情况
下。如果 containerA 处于空闲状态,为了充分利用 CPU 资源,containerB 也可以分配到全部可用的 CPU。


1.5.2 cpu核心控制

指定cpu核心参数:—cpuset 可以绑定 CPU
对多核 CPU 的服务器,docker 还可以控制容器运行限定使用哪些 cpu 内核和内存节点,即使用—
cpuset-cpus 和—cpuset-mems 参数。对具有 NUMA 拓扑(具有多 CPU、多内存节点)的服务
器尤其有用,可以对需要高性能计算的容器进行性能最优的配置。如果服务器只有一个内存节点,则
—cpuset-mems 的配置基本上不会有明显效果
扩展: 服务器架构一般分: SMP、NUMA、MPP 体系结构介绍,从系统架构来看,目前的商用服务器大体可以分为三类:
(1) 即对称多处理器结构(SMP : Symmetric Multi-Processor) 例: x86 服务器,双路服务器。
主板上有两个物理 cpu
(2) 非一致存储访问结构 (NUMA : Non-Uniform Memory Access) 例: IBM 小型机
pSeries 690
(3) 海量并行处理结构 (MPP : Massive ParallelProcessing) 。 例: 大型机 Z14

taskset 命令
taskset 设定 cpu 亲和力,taskset 能够将一个或多个进程绑定到一个或多个处理器上运行
-c, —cpu-list 以列表格式显示和指定 CPU
-p, —pid 在已经存在的 pid 上操作
例: 设置只在 cup ID 是 1 和 2 的 cpu 上运行 sshd 进程程序

  1. ps -ef | grep sshd
  2. taskset -cp 1,2 1002 # 设置
  3. taskset -cp 1002 # 查询验证

扩展: 为什么把进程绑定到 cpu 上,运行效率就高?
注:当 cpu 数量很多时,确实需要绑定进程到 cpu 上,这样可以减少 cpu 上下文切换的开销,节约时间。
扩展: 什么是 cpu 上下文切换?
我们都知道,Linux 是一个多任务操作系统,它支持远大于 CPU 数量的任务同时运行。当然这些任务实际上并不是真的在同时运行,而是因为系统在很短的时间内,将 CPU 轮流分配给它们,造成多任务同时运行的错觉。而在每个任务运行前,CPU 都需要知道任务从哪里加载、又从哪里开始运行,也就是说,需要系统 事先帮它设置好 CPU 寄存器和程序计数器(Program Counter,PC)。 CPU 寄存器,是 CPU 内置的容量小、但速度极快的内存。而程序计数器,则是用来存储 CPU 正在执行的指令位置、或者即将执行的下一条指令位置。它们都是 CPU 在运行任何任务前,必须的依赖环境,因此也被叫做 CPU 上下文。
而这些保存下来的上下文,会存储在系统内核中,并在任务重新调度执行时再次加载进来。这样就能保证任务原来的状态不受影响,让任务看起来还是连续运行。根据任务的不同,CPU 的上下文切换可以分为不同的场景,也就是进程上下文切换、线程上下文切换、中断上下文切换。

创建一个容器并指定cpu核数

  1. docker run -it --cpuset-cpus 0-2 centos bash
  2. cat /sys/fs/cgroup/cpuset/cpuset.cpus

1.5.3 cpu配额参数混合使用

在上面这些参数中,cpu-shares 控制只发生在容器竞争同一个 cpu 的时间片时有效。如果通过 cpuset-cpus 指定容器 A 使用 cpu 0,容器 B 只是用 cpu1,在主机上只有这两个容器使用对应内核的情况,它们各自占用全部的内核资源,cpu-shares 没有明显效果。如果容器 A 和容器 B 配置上 cpuset-cpus 值并都绑定到同一个 cpu 上,然后同时抢占 cpu 资源,就可 以看出效果了。

stress 命令
概述:linux 系统压力测试软件 stress。stress 可以测试 Linux 系统 cpu/memory/IO/disk 的 负载。
下载页:http://people.seas.harvard.edu/~apw/stress/
或者使用epel下载

  1. yum -y install epel-release
  2. yum -y install stress

参数解释:
-v 显示版本号
-q 不显示运行信息
-n 显示已完成的指令情况
-t —timeout N 指定运行 N 秒后停止
—backoff N 等待 N 微妙后开始运行
-c 产生 n 个进程 :每个进程都反复不停的计算随机数的平方根,测试 cpu
-i 产生 n 个进程 :每个进程反复调用 sync(),sync()用于将内存上的内容写到硬盘上,测试磁盘 io
-m —vm n 产生 n 个进程,每个进程不断调用内存分配 malloc()和内存释放 free()函数,测试内存
—vm-bytes B 指定 malloc 时内存的字节数 (默认 256MB)
—vm-hang N 指定在 free 栈的秒数
-d —hadd n 产生 n 个执行 write 和 unlink 函数的进程
-hadd-bytes B 指定写的字节数
—hadd-noclean 不 unlink
注:时间单位可以为秒 s,分 m,小时 h,天 d,年 y,文件大小单位可以为 K,M,G
例: 产生2个cpu进程,2个io进程,10秒后停止运行

  1. stress -c 2 -i 2 -t 10
  2. top

运行2个容器测试
例: 创建两个容器实例:docker10 和 docker20。 让 docker10 和 docker20 只运行在 cpu0 和 cpu1 上,最终测试一下 docker10 和 docker20 使用 cpu 的百分比

  1. docker run -itd --name docker10 --cpuset-cpus 0,1 --cpu-shares 512 centos:centos7.9.2009 /bin/bash
  2. docker run -itd --name docker20 --cpuset-cpus 0,1 --cpu-shares 1024 centos:centos7.9.2009 /bin/bash

进入docker10 执行命令,并查看是否在cpu0,1上运行

  1. docker exec -it docker10 /bin/bash
  2. yum install -y epel-release
  3. yum install stress -y
  4. stress -c 2 -v -t 10m

进入docker20 执行命令

  1. docker exec -it docker20 /bin/bash
  2. yum install -y epel-release
  3. yum install stress -y
  4. stress -c 2 -v -t 10m

1.6 docker 容器实例运行结束后自动释放资源

—rm 参数
作用:当容器命令运行结束后,自动删除容器,自动释放资源
应用场景:在某些环境下,可能需要大量的新建 docker 实例,然后仅仅运行几秒钟或几分钟,然后
就彻底删除。 如运行单元测试或测试弹性云计算
使用展示

  1. docker run -it --rm --name yin centos sleep 5
  2. docker ps -a | grep mk

等5秒之后在查看

  1. docker ps | grep mk

1.7 docker容器资源配额控制之内存

Docker 提供参数-m, —memory=””限制容器的内存使用量
执行命令测试

  1. docker run -it -m 128m centos
  2. cat /sys/fs/cgroup/memory/memory.limit_in_bytes

和之前CPU命令整合使用举例

  1. docker run -it --cpuset-cpus 0,1 -m 128m centos

1.8 docker 容器资源配额控制之 IO

Docker提供参数:
—device-write-bps value #限制此设备上的写速度(bytes per second),单位可以是 kb、mb 或者 gb。
—device-read-bps value #限制此设备上的读速度(bytes per second),单位可以是 kb、 mb 或者 gb。
执行命令测试
注:dd 参数:
direct:读写数据采用直接 IO 方式,不走缓存。直接从内存写硬盘上。
nonblock:读写数据采用非阻塞 IO 方式,优先写 dd 命令的数据

  1. docker run -it -v /var/www/html/:/var/www/html --device /dev/sda:/dev/sda --device-write-bps /dev/sda:1mb centos /bin/bash
  2. time dd if=/dev/sda of=/var/www/html/test.out bs=1M count=50 oflag=direct,nonblock

1.9 docker 容器的数据管理

1.9.1 docker容器的数据卷

什么是数据卷?
数据卷是经过特殊设计的目录,可以绕过联合文件系统(UFS),为一个或者多个容器提供访问,数据卷设计的目的,在于数据的永久性,它完全独立于容器的生存周期,因此,docker 不会在容器删除时删除其挂载的数据卷,也不会存在类似的垃圾收集机制,对容器引用的数据卷进行处理,同一个数据卷可以只支持多个容器的访问。
数据卷的特点:
(1)数据卷在容器启动时初始化,如果容器使用的镜像在挂载点包含了数据,这些数据会被拷贝到新初始化的数据卷中
(2)数据卷可以在容器之间共享和重用
(3)可以对数据卷里的内容直接进行修改
(4)数据卷的变化不会影像镜像的更新
(5)卷会一直存在,即使挂载数据卷的容器已经被删除
举例说明:

  1. mkdir /datavolume
  2. docker run --name volume -v /datavolume:/data -itd centos /bin/bash

注:datavolume 为宿主机目录,data 为 docker 启动的 volume 容器的目录 这样在宿主机的/datavolume 目录下创建的数据就会同步到容器的/data 目录下
给容器挂载卷只读权限:

  1. docker run --name volume1 -v /datavolume1:/data:ro -itd centos /bin/bash

1.9.2 使用 dockerfile 构建包含数据卷的镜像

编写dockerfile文件测试

  1. FROM centos
  2. VOLUME ["/datavolume3","/datavolume6"]
  3. CMD /bin/bash

执行dockerfile并启动验证

  1. docker build -t volume .
  2. docker run --name volume-dubble -it volume

查看挂载的位置

  1. docker inspect volume-dubble

1.10 使用 registry 搭建 docker 私有仓库

1.10.1 初始化操作参考1.2.1和1.2.2


1.10.2 安装registry


将registry镜像包转递到服务器上并解压
registry 镜像官方下载地址:https://hub.docker.com/_/registry?tab=tags

  1. docker load -i registry.tar
  2. docker images

运行私有仓库镜像

  1. docker run -d --name registry -p 5000:5000 -v /opt/registry:/var/lib/registry registry:latest
  2. docker ps
  3. netstat -anutp | grep 5000
  4. curl http://127.0.0.1:5000/v2/_catalog

默认情况下,Registry 存放镜像的目录是/var/lib/registry 目录下,这样如果容器被删除,则存放于容器中的镜像也会丢失,所以我们一般情况下会指定本地物理机一个目录如/opt/registry 挂载到容器的/var/lib/registry 下。使用-v 参数,指定本地持久的路径。
在其他机器配置私有仓库地址

  1. # 配置docker配置文件, 指定docker加速节点为私有仓库地址
  2. # "insecure-registries":["192.168.1.64:5000"]
  3. vim /etc/docker/daemon.json

image.png

  1. systemctl daemon-reload && systemctl restart docker

其他机器拉取镜像测试

  1. docker pull busybox
  2. docker images

修改镜像名并推送镜像

  1. docker tag busybox:latest 192.168.1.66:5000/busybox:latest
  2. docker images
  3. docker push 192.168.1.66:5000/busybox

在私有仓库registry上查看

  1. curl 127.0.0.1:5000/v2/_catalog

在其他机器删除刚才上传的镜像,下载测试

  1. docker rmi -f 192.168.1.66:5000/busybox
  2. docker pull 192.168.1.66:5000/busybox
  3. docker run 192.168.1.64:5000/busybox:latest echo "hello"

1.11 使用 Harbor 搭建 Docker 私有仓库

1.11.1 Harbor介绍

Docker 容器应用的开发和运行离不开可靠的镜像管理,虽然 Docker 官方也提供了公共的镜像仓库,但是从安全和效率等方面考虑,部署我们私有环境内的 Registry 也是非常必要的。Harbor 是由VMware 公司开源的企业级的 Docker Registry 管理项目,它包括权限管理(RBAC)、LDAP、日志审核、管理界面、自我注册、镜像复制和中文支持等功能。
官网地址:https://github.com/goharbor/harbor


1.11.2 为Harbor自签发证书

  1. mkdir /opt/ssl -p && cd /data/ssl/
  2. yum -y install openssl
  3. openssl genrsa -out ca.key 3072
  4. #生成一个 3072 位的 key,也就是私钥

生成一个数字证书 ca.pem,3650 表示证书的有效时间是 10 年,按箭头提示填写即可,没有箭头标注的为空:

  1. openssl req -new -x509 -days 3650 -key ca.key -out ca.pem

You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter ‘.’, the field will be left blank.
——-
Country Name (2 letter code) [XX]:CN
State or Province Name (full name) []:guangdong
Locality Name (eg, city) [Default City]:shenzhen
Organization Name (eg, company) [Default Company Ltd]:yin
Organizational Unit Name (eg, section) []:CA
Common Name (eg, your name or your server’s hostname) []:k8s-slave2
Email Address []:yin@163.com

  1. openssl genrsa -out harbor.key 3072
  2. # #生成一个 3072 位的 key,也就是私钥
  3. openssl req -new -key harbor.key -out harbor.csr

You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter ‘.’, the field will be left blank.
——-
Country Name (2 letter code) [XX]:CN
State or Province Name (full name) []:guangdong
Locality Name (eg, city) [Default City]:shenzhen
Organization Name (eg, company) [Default Company Ltd]:yin
Organizational Unit Name (eg, section) []:CA
Common Name (eg, your name or your server’s hostname) []:k8s-slave2
Email Address []:yin@163.com

Please enter the following ‘extra’ attributes
to be sent with your certificate request
A challenge password []:123456
An optional company name []:123456

  1. openssl x509 -req -in harbor.csr -CA ca.pem -CAkey ca.key -CAcreateserial -out harbor.pem -days 3650
  2. # 签发证书
  3. openssl x509 -noout -text -in harbor.pem

1.11.3 安装harbor

安装准备工作

  1. mkdir /opt/install -p && cd /opt/install/
  2. ll /opt/ssl/

将Harbor配置压缩包,解压到instll目录下

  1. tar -xf harbor-offline-installer-v2.3.0-rc3.tgz

修改解压后配置文件

  1. cd harbor
  2. cp harbor.yml.tmpl harbor.yml
  3. vim harbor.yml

修改配置文件:
hostname: k8s-slave2
#修改 hostname,跟上面签发的证书域名保持一致
#协议用 https
certificate: /opt/ssl/harbor.pem
private_key: /opt/ssl/harbor.key
邮件和 ldap 不需要配置,在 harbor 的 web 界面可以配置
其他配置采用默认即可
修改之后保存退出
注:harbor 默认的账号密码:admin/Harbor12345
安装docker-compose
注: docker-compose 项目是 Docker 官方的开源项目,负责实现对 Docker 容器集群的快速编排。Docker-Compose 的工程配置文件默认为 docker-compose.yml,Docker-Compose 运行目录下的必要有一个 docker-compose.yml。docker-compose 可以管理多个 docker 实例。
下载地址: https://github.com/docker/compose
下载之后传到服务器

  1. mv docker-compose-Linux-x86_64.64 /usr/local/bin/docker-compose
  2. chmod a+x /usr/local/bin/docker-compose
  3. docker-compose version

解压Harbor镜像,然后启动

  1. docker load -i docker-harbor-2-3-0.tar.gz
  2. cd /opt/install/harbor
  3. ./install.sh
  4. cd /opt/install/harbor/
  5. docker-compose start # 启动
  6. docker-compose stop # 停止

注: 镜像不要停止,因为停止后,因为docker-compose有启动顺序问题,可能会起不来
访问ip,出现下图则成功:
image.png
账号: admin 密码:Harbor12345
image.png


1.11.4 上传镜像到Harbor仓库

创建项目名字,并测试传递镜像
image.png
创建成功:
image.png
去测试机器修改docker配置文件

  1. vim /etc/docker/daemon.json

image.png

  1. systemctl daemon-reload && systemctl restart docker
  2. systemctl status docker

登录到仓库, 并传递镜像

  1. docker login 192.168.1.66
  2. docker tag busybox:latest 192.168.1.66/test/busybox:v1
  3. docker images
  4. docker push 192.168.1.66/test/busybox:v1

下图为传递成功:
image.png


1.11.5 从Harbor仓库下载镜像

删除之前的镜像,然后拉取一次

  1. docker rmi -f 192.168.1.66/test/busybox:v1
  2. docker pull 192.168.1.66/test/busybox:v1
  3. docker images

启动测试

  1. docker run --name test-harbor 192.168.1.66/test/busybox:v1 echo "hello"

1.12 创建docker静态化

1.12.1 docker网络模式介绍

host 模式,使用—net=host 指定。
container 模式,使用—net=container:NAME_or_ID 指定。
none 模式,使用—net=none 指定。
bridge 模式,使用—net=bridge 指定,默认就是 bridge 模式。
container 模式是指,创建新容器的时候,通过—net container 参数,指定其和已经存在的某个容器共享一个 Network Namespace。如下图所示,右方黄色新创建的 container,其网卡共享左边容器。因此就不会拥有自己独立的 IP,而是共享左边容器的 IP 172.17.0.2,端口范围等 网络资源,两个容器的进程通过 lo 网卡设备通信。
image.png
默认选择 bridge 的情况下,容器启动后会通过 DHCP 获取一个地址,这可能不是我们想要的,在centos7 系统上, docker 环境下可以使用 pipework 脚本对容器分配固定 IP(这个 IP 可以是和物理机同网段 IP)
注:Pipework 有个缺陷,容器重启后 IP 设置会自动消失,需要重新设置。
Docker 网络 host 模式是指共享宿主机的网络
Docker 网络 none 模式是指创建的容器没有网络地址,只有 lo 网卡


1.12.2 配置桥接网络

桥接本地物理网络的目的,是为了局域网内用户方便访问 docker 实例中服务,不要需要各种端口映射即可访问服务。 但是这样做,又违背了 docker 容器的安全隔离的原则,工作中辩证的选择.
安装桥接设备br0

  1. yum -y install bridge-utils

备份网卡配置文件并修改配置

  1. cd /etc/sysconfig/network-scripts/
  2. cp ifcfg-ens33 /opt/
  3. tee ifcfg-ens33 << EOF
  4. TYPE=Ethernet
  5. PROXY_METHOD=none
  6. BROWSER_ONLY=no
  7. BOOTPROTO=static
  8. DEFROUTE=yes
  9. IPV4_FAILURE_FATAL=no
  10. IPV6INIT=yes
  11. IPV6_AUTOCONF=yes
  12. IPV6_DEFROUTE=yes
  13. IPV6_FAILURE_FATAL=no
  14. IPV6_ADDR_GEN_MODE=stable-privacy
  15. NAME=ens33
  16. DEVICE=ens33
  17. ONBOOT=yes
  18. BRIDGE="br0"
  19. EOF

生成桥设备br0的配置文件,并重启网卡服务

  1. tee ifcfg-br0 << EOF
  2. DEVICE="br0"
  3. NM_CONTROLLED="yes"
  4. ONBOOT="yes"
  5. TYPE="Bridge"
  6. BOOTPROTO=none
  7. IPADDR=192.168.1.63
  8. NETMASK=255.255.255.0
  9. GATEWAY=192.168.1.1
  10. DNS1=192.168.1.1
  11. EOF
  12. systemctl restart network
  13. ip addr

扩展:配置桥接 br0 的好处?
(1)真实主机中安装的虚拟主机,需要和外界主机进行通讯的时候,数据需要通过真实主机的网卡进行传输,但是虚拟主机内核无法对真实主机的网卡进行控制,一般情况下需要虚拟主机先将数据包发送给真实主机的内核,再由真实主机内核将该数据通过真实物理网卡发送出去,该过程可以用 NAT(网络地址转换)实现该功能,但是数据传数度较慢。
(2)可以由真实主机的内核虚拟出来一个接口 br0,同时这个也是一个对外的虚拟网卡设备,通过该接口可以将虚拟主机网卡和真实主机网卡直接连接起来,进行正常的数据通讯,提升数据传输效率。该过程就是桥接。


1.12.3 使用pipework配置静态ip

  1. 下载pipework

方法 1:直接下载 pipework zip 包,把 pipework-master.zip 上传到 Linux 中
https://github.com/jpetazzo/pipework
image.png

(2)使用 git 获得:
git 下载链接:https://github.com/jpetazzo/pipework.git

  1. git clone https://github.com/jpetazzo/pipework.git

image.png


  1. 设置pipework命令并重启docker ```bash unzip pipework-master.zip

cp ./pipework-master/pipework /usr/bin/

systemctl restart docker

  1. ---
  2. <a name="Na6gY"></a>
  3. ### 1.12.4 创建4种网络模式的容器
  4. none模式创建一个docker容器
  5. ```bash
  6. docker pull centos
  7. docker run -itd --name none --net=none --privileged=true centos bash
  8. docker ps

扩展:
—privileged=true #允许开启特权功能
privileged [ˈprɪvəlɪdʒd] 有特权的
在 docker 0.6 版以后,privileged 被引入 docker。使用该参数,container 内的 root 拥有真正的 root 权限。否则,container 内的 root 只是外部物理机的一个普通用户权限。使用 privileged 启动的容器,可以看到很多 host 上的设备,并且可以执行 mount。甚至允许你在docker 容器中启动 docker 容器。不启用 privileged,容器中 root 用户不能执行 mount。
用 pipework 给容器分配地址
pipework 语法:pipework 网桥名 容器实例 ID 分配给容器的 IP/掩码@网关

  1. pipework br0 42ace1da75c9 192.168.1.166/24@192.168.1.2
  2. docker inspect 42ace1da75c9

进入容器测试网络:

  1. docker exec -it 42ace1da75c9 /bin/bash
  2. ip addr

创建共享宿主机网络的容器并测试

  1. docker run --name host -itd --net=host --privileged=true centos bash
  2. docker exec -it host /bin/bash
  3. ip addr

创建共享容器网络模式的容器

  1. docker run --name container2 --net=container:none -itd --privileged=true centos bash
  2. docker exec -it container2 /bin/bash
  3. ip addr

创建桥接网络模式的容器

  1. docker run --name bridge -itd --privileged=true centos bash
  2. docker exec -it bridge bash
  3. ip addr

1.13 docker的数据卷容器创建、备份和还原

1.13.1 数据卷容器介绍

什么是数据卷容器?
一个容器挂载了数据卷,其他容器通过挂载这个容器实现数据共享,挂载数据卷的容器,就叫做数据
卷容器,挂载数据卷容器的方法:
docker run —volumes-from [container name]


1.13.2 创建数据卷容器并用容器挂载

注: 需要先创建数据卷镜像,参考1.9.2
创建数据卷容器并创建测试文件

  1. docker run --name data-volume -itd volume bash
  2. docker exec -it data-volume /bin/bash
  3. touch datavolume6/lucky.txt

创建一个容器挂载数据卷容器挂载的文件

  1. docker run --name data-volume2 --volumes-from data-volume -itd centos /bin/bash
  2. docker exec -it data-volume2 /bin/bash
  3. cd /datavolume6 && ls

注意:生产容器做持久化,还用-v 参数,这样数据才不会丢失,多个容器可以挂载同一个物理机目录,做到数据同步


1.13.3 docker数据卷的备份和还原

备份方法
docker run —volumes-from [container name] -v $(pwd):/backup centos tar czvf /backup/backup.tar [container data volume]
—volumes-from : 挂载一个要备份的容器。
-v :指定保存数据的路径。
tar cvf :是容器运行时执行的命令,执行的是一个压缩文件 tar 命令,将要备份容器中的目录压缩到指定的目录下。

  1. docker run --volumes-from data-volume2 -v /root/backup:/backup --name datavolume-copy centos tar zcvf /backup/data-volume2.tar.gz /datavolume6

数据还原
docker run —volumes-from [container name] -v $(pwd):/backup centos tar xzvf /backup/backup.tar.gz [container data volume]

  1. docker run --volumes-from data-volume2 -v /root/backup/:/backup centos tar zxvf /backup/data-volume2.tar.gz -C /

二、Containerd 容器概述

2.1 Containerd 容器介绍

2.1.1 什么是Containerd

官方文档:
https://containerd.io
containerd 是容器虚拟化技术,从 docker 中剥离出来,形成开放容器接口(OCI)标准的一部分。containerd 是真正管控容器的一个进程,执行容器的时候用的是 runc。
docker 对容器的管理和操作基本都是通过 containerd 完成的。Containerd 是一个工业级标准的容器运行时,它强调简单性、健壮性和可移植性。Containerd 可以在宿主机中管理完整的容器生命周期:容器镜像的传输和存储、容器的执行和管理、容器的网络管理等。
什么是容器运行时?
Docker、containerd、podman 都可以称为容器运行时
什么是 OCI?
Linux 基金会于 2015 年 6 月成立 OCI(Open Container Initiative)组织,旨在围绕容器格式和运行时制定一个开放的工业化标准,目前主要有两个标准文档:容器运行时标准 (runtime spec)和容器镜像标准(image spec)。制定容器格式标准的宗旨就是不受上层结构的绑定,如特定的客户端、编排栈等,同时也不受特定的供应商或项目的绑定,即不限于某种特定操作系统、硬件、CPU 架构、公有云等。


2.1.2 Containerd和docker区别

containerd 是从 Docker 中分离出来的一个项目,可以作为一个底层容器运行时,现在它成了 Kubernete 容器运行时更好的选择。K8s1.20 之后版本官方建议用 containerd 作为容器引擎,K3S 中默认使用 containerd 作为容器运行时,我们只要安装好 K3S 就可以使用它了。
K8S 为什么要放弃使用 Docker 作为容器运行时,而使用 containerd 呢?
如果你使用 Docker 作为 K8S 容器运行时的话,kubelet 需要先要通过 dockershim 去调用Docker,再通过 Docker 去调用 containerd。如果你使用 containerd 作为 K8S 容器运行时的话,由于 containerd 内置了 CRI 插件,kubelet 可以直接调用 containerd。使用 containerd 不仅性能提高了(调用链变短了),而且资源占用也会变小(Docker 不是一个纯粹的容器运行时,具有大量其他功能)。


2.1.3 Containerd 容器组件

Containerd 是一个容器运行时组件,它原本是 Docker 平台的一部分,虽然小巧,但在整个系统里起到很关键的作用。Containerd 被设计成一种解耦合并且很容易与其它工具集成的组件。
containerd:守护进程,高性能容器运行时。
ctr:containerd 的命令行客户端。
runc:运行容器的命令行工具。
image.png


2.1.4 docker、containerd、docker-shim、runC 关系

containerd 囊括了单机运行一个容器时所需要的一切:
为了能够支持多种 OCI Runtime,containerd 内部使用 containerd-shim,每启动一个容器都会创建一个新的 containerd-shim 进程,指定容器 ID,Bundle 目录,运行时的二进制(比如 runc)。
1、docker:docker 本身而言,包括了 docker client 和 dockerd,是一个客户端工具,用来把用户的请求发送给 docker daemon(dockerd)。dockerd: dockerd 是对容器相关操作的最上层封装,直接面向操作用户。docker daemon,一般也会被称为 docker engine。dockerd 启动时会启动 containerd 子进程。
2、containerd:是一个工业级标准的容器运行时,它强调简单性、健壮性和可移植性,几乎囊括了单机运行一个容器运行时所需要的一切:执行,分发,监控,网络,构建,日志等。主要作用是:
1)、管理容器的生命周期(从创建容器到销毁容器)
2)、拉取/推送容器镜像
3)、存储管理(管理镜像及容器数据的存储)
4)、调用 runC 运行容器(与 runC 等容器运行时交互)
5)、管理容器网络接口及网络,dockerd 实际真实调用的还是 containerd 的 api 接口,containerd 是 dockerd 和 runC 之间的一个中间交流组件。
3、containerd-shim:一个真实运行容器的载体,为了能够支持多种 OCI Runtime,containerd 内部使用 containerd-shim,每启动一个容器都会起一个新的 containerd-shim 的进程。它通过指定三个参数:容器 ID、bundle 目录(对应某个容器生成的目录,一般位于:/var/run/docker/containerd/containerID)
4、runC:是一个轻量级的工具,用来运行容器的,我们可以不用通过 docker 引擎,直接运行容器。
他们之间的调用关系如下图:
image.png
docker 镜像和 containerd 镜像通用,但组织方式和存放目录不同,导致 docker 与 ctr 命令不通用,各自管理自己的镜像和容器。


2.2 Containerd 的安装和配置

2.2.1 安装

  1. yum -y install containerd jq

jq 是什么?
jq 可以对 json 数据进行分片、过滤、映射和转换,和 sed、awk、grep 等命令一样,都可以让你轻松地把玩文本。它能轻松地把你拥有的数据转换成你期望的格式,而且需要写的程序通常也比你期望的更加简短


2.2.2 初始化配置

  1. containerd config default > /etc/containerd/config.toml
  2. systemctl enable containerd --now
  3. vim /etc/containerd/config.toml

sandbox_image = “k8s.gcr.io/pause:3.2” 替换成 registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.2
配置镜像加速: 先找到关键词: registry.mirrors 在下面添加
[plugins.”io.containerd.grpc.v1.cri”.registry.mirrors.”docker.io”]
endpoint = [“https://registry.docker-cn.com“]

  1. # 上传文件并解压
  2. docker load -i pause.tar.gz
  3. systemctl daemon-reload && systemctl restart containerd
  4. systemctl statud containerd

2.2.3 Containerd 应用-和镜像和容器相关命令练习

查看镜像

  1. ctr image ls

下载镜像,必须是全路径

  1. ctr image pull docker.io/library/busybox:latest

创建容器

  1. ctr run -d docker.io/library/busybox:latest busybox-v1

查看容器在宿主机PID

  1. ctr task ls

进入容器

  1. ctr task exec --exec-id 2917 -t busybox-v1 sh

删除容器

  1. ctr task rm -f busybox-v1

与宿主机共享目录

  1. ctr run --mount type=bind,src=/tmp,dst=/host,options=rbind:rw -d docker.io/library/busybox:latest busybox-v6

压缩镜像

  1. ctr images export busybox.tar.gz docker.io/library/busybox:latest

导入镜像

  1. ctr images import busybox.tar.gz

ctr 和 docker 命令比较
image.png


三、 Podman 容器概述

3.1 Podman 容器介绍

3.1.1 什么是podman

Docker 是很流行的容器技术,它在运行的时候有一个守护进程,需要把服务启动起来,才能通过CLI 管理容器,镜像,守护进程负责处理很多的事情,所以就可能有单点故障风险,当 Docker 服务程序
挂了,依赖它启动的容器就都不能使用了。
Podman 原来是 CRI-O 项目的一部分,后来被分离成一个单独的项目叫 libpod。Podman 的使用体验和 Docker 类似,不同的是 Podman 没有 daemon。以前使用 Docker CLI 的时候,Docker CLI 会通过 gRPC API 去跟 Docker Engine 说「我要启动一个容器」,然后 Docker Engine 才会通过 OCI Container runtime(默认是 runc)来启动一个容器。这就意味着容器的进程不可能是 Docker CLI 的子进程,而是 Docker Engine 的子进程。
3.1.2 podman特点
Podman 无需守护进程,可以用来管理容器,镜像,以下是它的一些特点:
无需安装 Docker,安装 Podman 后就可以进行管理
Podman 的命令与 Docker 几乎相同
Docker 下的镜像 Podman 也可以使用
Podman 存储它的镜像和容器与 Docker 的位置不同(即通过 Podman 载入镜像后,用 Docker查看镜像是看不到的)
Podman 控制下的容器可以由 root 用户运行,也可以由非特权用户运行
Podan 可以利用 RUNC 直接与容器,镜像或者内核等进行交互,而不是通过守护进程,是使用用户 命名空间来模拟容器中的 root,无需连接到具有 root 权限的套接字保证容器的体系安全
image.png
使用 Podman 之后我们不需要管理和 Docker 守护进程一样的守护进程,Podman 也同样支持Docker 命令,他们的镜像也是兼容的
Podman 官网地址:https://podman.io/
Podman 项目地址:https://github.com/containers/libpod


3.2 podman 容器的安装和应用

3.2.1 安装podman

  1. yum -y install podman
  2. podman version
  3. podman info

3.2.2 podman 应用

  1. 搜索镜像
    1. podman search busybox
    拉取镜像
    1. podman pull busybox
    查看镜像列表
    1. podman images
    给镜像打标签
    1. podman tag busybox busybox:v1
    删除镜像
    1. podman rmi -f busybox:v1
    创建容器
    1. podman run busybox /bin/echo "Hello,Welcome to xuegod"
    创建容器并进入
    1. podman run -it busybox sh
    查看运行容器
    1. podman ps
    删除容器
    1. podman rm -f 8dca6db6aa1f
    podman部署应用 ```bash podman pull docker.io/library/nginx

podman images

podman run —name nginx-v1 —restart=always -itd -p 9090:80 docker.io/library/nginx

curl localhost:9090

  1. ---
  2. <a name="c7KAx"></a>
  3. # 四、kubernetes 深度解读-架构-组件-资源对象
  4. <a name="k6p3P"></a>
  5. ## 4.1 Kubernetes 概述
  6. 官网: <br />kubernetes github 地址: https://github.com/kubernetes/kubernetes <br />kubernetes 官方站点: <br />https://kubernetes.io/ <br />https://kubernetes.io/zh/ <br />https://kubernetes.io/docs/<br />Kubernetes 起源<br />Kubernetes 单词起源于希腊语, 是“舵手”或者“领航员、飞行员”的意思。 来源于 Google 的 Borg 项目<br />Borg 是谷歌内部的一个容器编排工具,谷歌业务 90%以上都在 Borg 上运行,Borg 在谷歌内部已经使用了大概 15 年。 K8S 是在 Borg 的基础上开发出来的轻量级容器编排工具。K8S 的根基非常牢固,是站在巨人的肩膀上发展起来的项目。开源之后,迅速称霸容器编排技术领域
  7. ---
  8. <a name="eUZzQ"></a>
  9. ## 4.2 Kubernetes 架构
  10. 单节点和高可用架构: <br />k8s 的物理架构是 master/node 模式: <br />k8s 集群至少需要一个主节点(Master)和多个工作节点(Worker),主节点是集群的控制节点,负责整个集群的管理和控制,主节点主要用于暴露 API,调度部署和节点的管理。工作节点主要是运行容器 的。 <br />单 master 节点架构如下:<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1650901239372-e7c4cc8b-f9d0-4a14-bf25-688bc7bdcaa0.png#clientId=u8dcb8bc6-ba77-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=656&id=u490b68ca&margin=%5Bobject%20Object%5D&name=image.png&originHeight=656&originWidth=1245&originalType=binary&ratio=1&rotation=0&showTitle=false&size=143789&status=done&style=none&taskId=u17896dc5-a4a6-4777-9bee-bf7526fbc6e&title=&width=1245)<br />多 master 节点架构如下:<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1650901212533-9e14c06d-081c-48f5-a0ea-84ff16d50a48.png#clientId=u8dcb8bc6-ba77-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=484&id=ube4aea97&margin=%5Bobject%20Object%5D&name=image.png&originHeight=484&originWidth=1159&originalType=binary&ratio=1&rotation=0&showTitle=false&size=78613&status=done&style=none&taskId=u676d410f-d4bb-4542-b7ef-f554cb4913b&title=&width=1159)
  11. ---
  12. <a name="mAFwt"></a>
  13. ## 4.2 Kubernetes 组件
  14. kubectl:管理 k8s 的命令行工具,可以操作 k8s 中的资源对象。<br />etcd 是一个高可用的键值数据库,存储 k8s 的资源状态信息和网络信息的,etcd 中的数据变更是通过 api server 进行的<br />apiserver: 提供 k8s api,是整个系统的对外接口,提供资源操作的唯一入口,供客户端和其它组件调用,提供了 k8s 各类资源对象(pod,deployment,Service 等)的增删改查,是整个系统的数据总 线和数据中心,并提供认证、授权、访问控制、API 注册和发现等机制,并将操作对象持久化到 etcd 中。相当于“营业厅”。<br />scheduler:负责 k8s 集群中 pod 的调度的 scheduler 通过与 apiserver 交互监听到创建 Pod 副本的信息后,它会检索所有符合该 Pod 要求的工作节点列表,开始执行 Pod 调度逻辑。调度成功后将 Pod 绑定到目标节点上,相当于“调度室”。<br />controller-manager:作为集群内部的管理控制中心,负责集群内的 NodePod 副本、服务端点(Endpoint)、命名空间(Namespace)、服务账号(ServiceAccount)、资源定额(ResourceQuota)的管理,当某个 Node 意外宕机时,Controller Manager 会及时发现并执行自动 化修复流程,确保集群始终处于预期的工作状态。<br />kubelet 每个 Node 节点上的 kubelet 定期就会调用 API Server REST 接口报告自身状态,API Server 接收这些信息后,将节点状态信息更新到 etcd 中。kubelet 也通过 API Server 监听 Pod 信息,从而对 Node 机器上的 POD 进行管理:如创建、删除、更新 Pod<br />kube-proxy:提供网络代理和负载均衡,是实现 service 的通信与负载均衡机制的重要组件,kube-proxy 负责为 Pod 创建代理服务,从 apiserver 获取所有 service 信息,并根据 service 信息创建代理服务,实现 service Pod 的请求路由和转发,从而实现 K8s 层级的虚拟转发网络,将到 service 的请求转发到后端的 pod 上。<br />CalicoCalico 是一个纯三层的网络插件,calico bgp 模式类似于 flannel host-gwcalico kubernetes 中可提供网络功能和网络策略<br />Cordnsk8s1.11 之前使用的是 kube dns1.11 之后才有 corednscoredns 是一个 DNS 服务器,能够为 Kubernetes services 提供 DNS 记录
  15. ---
  16. <a name="ykXwc"></a>
  17. ## 4.3 Kubernetes 容器编排工具的优势
  18. 灵活部署 <br />kubernetes 支持在多种平台部署,可在私有云,公有云,混合云,openstackVMware vSphereVMware Workstation,虚拟机,物理机等环境部署。 <br />安全高效,拥有完善的认证授权机制,自带审计功能 <br />可以对多用户做细化的授权管理(如 rbac 授权),达到相互之间的操作完全隔离,互不影响,而且自身带有审计功能,可以对操作过程进行实时的日志记录,出现问题可以方便排查。 <br />负载均衡 <br />支持四层、七层负载均衡,可用于多种场景。 <br />可扩展性强 <br />拥有强大的集群扩展能力,可以根据业务规模自动增加和缩减主机节点的数量,确保服务可以承受大量并发带来的压力,保证业务稳定运行。 <br />根据节点资源的使用情况对 pod 进行合理的调度可以按照用户需要调度 pod,例如保证 Pod 只在资源足够的节点上运行,会尝试把同一功能的 pod分散在不同的节点上,还会尝试平衡不同节点的资源使用率等。 <br />拥有完善的灾备预警方案拥有多种灾备解决方案,支持备份和容灾,出现故障可以达到秒级切换,保证线上业务不受影响
  19. ---
  20. <a name="mHSok"></a>
  21. ## 4.4 Kubernetes 功能详解
  22. 多租户网络隔离 <br />Kubernetes 支持多种网络插件,如 flannelcalicocanel 等,每个插件都有独特的性能,可以分别适用于多种场景,我们可以利用 calico network policy(网络策略)解决 k8s 中的网络隔离,对于多租户场景,可以每一个用户一个名称空间(namespace),然后对这个名称空间设置网络隔离。 <br />高可用,高可扩展性 <br />kubernetes 中支持多种高可用解决方案,如 keepalive+nginxkeepalived+haproxy 等,可以使访问流量分发到不同的主机节点,减轻节点压力,如果某个节点出现故障,可以实现秒级切换,达到高可用,保证业务不中断。 <br />用户数据的持久化存储 <br />kubernetes 支持多种持久化存储解决方案,保证数据可以得到很好的备份,降低用户数据丢失的风险,kubernetes 中可使用的存储方案如下: <br />本地存储:emptyDirhostPath <br />网络连接类存储: <br />SAN(存储局域网络):iSCSI <br />NAS(网络附加存储):nfscifs <br />分布式存储:glusterfsceph-块级别的存储,cephfs-文件系统级别的存储<br />云存储:Azure Disk <br />拥有多种更新回滚策略 <br />通过管理 kubernetes 的控制器和 service 等,可以实现灰度发布,蓝绿部署,金丝雀发布等,达到业务的快速部署和回滚等操作,用以满足不同客户的不同升级需求。 <br />弹性伸缩 <br />根据访问的流量压力,可以实现 pod 的水平扩容和缩减,达到秒级扩容,让 pod 始终处于满足业务正常运行时所需的数量即可,避免了资源的浪费。 <br />节省资源,优化硬件资源的使用 <br />根据流量进行业务组件的扩缩容,可节省机器资源。
  23. ---
  24. <a name="tHIqN"></a>
  25. ## 4.5 容器部署和传统部署对比分析
  26. ![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1650901947826-f8327955-0dd0-4827-92b7-a069386a672e.png#clientId=u8dcb8bc6-ba77-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=474&id=u91d4d105&margin=%5Bobject%20Object%5D&name=image.png&originHeight=474&originWidth=1165&originalType=binary&ratio=1&rotation=0&showTitle=false&size=93530&status=done&style=none&taskId=u7a1c7b05-78fc-4833-a83a-f531f91ec4e&title=&width=1165)<br />1.传统部署时代: <br />早期,应用程序在物理服务器上运行。无法为物理服务器中的应用程序定义资源边界,这会导致资源分配问题。例如,如果在物理服务器上运行多个应用程序,则可能会出现一个应用程序占用大部分资源的情况,结果可能导致其他应用程序的性能下降。一种解决方案是在不同的物理服务器上运行每个应用程序,但是由于资源利用不足而无法扩展,并且许多物理服务器的维护成本也很高。 <br />2.虚拟化部署时代: <br />作为解决方案,引入了虚拟化功能,它允许你在单个物理服务器的 CPU 上运行多个虚拟机(VM)。虚拟化功能允许应用程序在 VM 之间隔离,并提供安全级别,因为一个应用程序的信息不能被另一应用程序自由地访问。因为虚拟化可以轻松地添加或更新应用程序、降低硬件成本等,所以虚拟化可以更好地利用物理服务器中的资源,并可以实现更好的可伸缩性。每个 VM 是一台完整的计算机,在虚拟化硬件之上运行所有组件,包括其自己的操作系统。<br />3.容器部署时代: <br />容器类似 VM,但是它们具有轻量级的隔离属性,可以在应用程序之间共享操作系统(OS)。因此,容器被认为是轻量级的。容器与 VM 类似,具有自己的文件系统、CPU、内存、进程空间等。由于它们与基础架构分离,因此可以跨云和 OS 分发进行移植。容器因具有许多优势而变得流行起来。下面列出了容器的一些好处:<br />1)敏捷应用程序的创建和部署: <br />与使用 VM 镜像相比,提高了容器镜像创建的简便性和效率.持续开发、集成和部署: <br />2)通过快速简单的回滚(由于镜像不可变性),提供可靠且频繁的容器镜像构建和部署。 <br />3)关注开发与运维的分离: <br />在构建/发布时而不是在部署时创建应用程序容器镜像,从而将应用程序与基础架构分离。 <br />4)可观察性:不仅可以显示操作系统级别的信息和指标,还可以显示应用程序的运行状况和其他指标信号。 <br />5)跨开发、测试和生产的环境一致性:在便携式计算机上与在云中相同地运行。 <br />6)云和操作系统分发的可移植性:可在 Ubuntu、RHEL、CoreOS、本地、Google Kubernetes Engine 和其他任何地方运行。 <br />7)以应用程序为中心的管理:提高抽象级别,从在虚拟硬件上运行 OS 到在 OS 上运行应用程序。 <br />8)松散耦合、分布式、弹性、解放的微服务:应用程序被分解成较小的独立部分,并且可以动态部署和管理 - 而不是在一台大型单机上整体运行。 <br />9)资源隔离:可预测的应用程序性能。 <br />10)资源利用:高效率和高密度
  27. ---
  28. <a name="yOo5M"></a>
  29. ## 4.6 Kubernetes 中的资源对象
  30. 最小调度单元 Pod <br />Pod Kubernetes 中的最小调度单元,当指派容器时,容器实际上并不会指派到物理硬件上,容器会被分配到一个 Pod 里,一个 Pod 封装一个容器(也可以封装多个容器),Pod 里的容器共享存储、网络等。也就是说,应该把整个 pod 看作虚拟机,然后每个容器相当于运行在虚拟机的进程。所有容器都被统一安排和调度,并运行在共享的上下文中。对于具体应用而言,Pod 是它们的逻辑主机,Pod 包含业务相关的多个应用容器。<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1650902244906-769fe8f3-2931-47b0-ae63-0cc5ffe6a105.png#clientId=u8dcb8bc6-ba77-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=510&id=uaea6cfe2&margin=%5Bobject%20Object%5D&name=image.png&originHeight=510&originWidth=771&originalType=binary&ratio=1&rotation=0&showTitle=false&size=526602&status=done&style=none&taskId=u24c63005-2fb9-4b20-89bb-28ae07c275f&title=&width=771)<br />资源对象标签 label<br />label 是标签的意思,k8s 中的资源对象大都可以打上标签,如 Node、Pod、Service 等,一个资源可以绑定任意多个 label,k8s 通过 Label 可实现多维度的资源分组管理,后续可通过 Label Selector 查询和筛选拥有某些 Label 的资源对象,例如创建一个 Pod,给定一个 Label 是app=tomcat,那么 service 可以通过 label selector 选择拥有 app=tomcat 的 pod,和其相关联,也可通过 app=tomcat 删除拥有该标签的 Pod 资源。<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1650902233518-849f2b79-c7d7-40a5-a19a-61aa58b7d509.png#clientId=u8dcb8bc6-ba77-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=422&id=u010a3c0e&margin=%5Bobject%20Object%5D&name=image.png&originHeight=422&originWidth=1150&originalType=binary&ratio=1&rotation=0&showTitle=false&size=49179&status=done&style=none&taskId=u44c6706b-db20-4e40-a4de-2d7cacaa93a&title=&width=1150)<br />控制器 Deployment<br />Replicaset: <br />Kubernetes 中的副本控制器,管理 Pod,使 pod 副本的数量始终维持在预设的个数。 <br />Deployment: <br />Deployment 管理 Replicaset 和 Pod 的副本控制器,Deployment 可以管理多个 Replicaset,是比 Replicaset 更高级的控制器,也即是说在创建 Deployment 的时候,会自动创建 Replicaset,由Replicaset 再创建 Pod,Deployment 能对 Pod 扩容、缩容、滚动更新和回滚、维持 Pod 数量。<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1650902223727-d61afa7c-b181-45aa-86da-3606a89dd044.png#clientId=u8dcb8bc6-ba77-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=464&id=u0d06ac43&margin=%5Bobject%20Object%5D&name=image.png&originHeight=464&originWidth=711&originalType=binary&ratio=1&rotation=0&showTitle=false&size=31795&status=done&style=none&taskId=ua197f501-8a69-42e6-981c-689ea64d7e4&title=&width=711)<br />控制器 Statefulset<br />Deployment 控制器是用来管理无状态应用的:管理的所有 Pod 一模一样,提供同一个服务,也不考虑在哪台 Node 运行,可随意扩容和缩容。这种应用称为“无状态”,例如 Web 服务,Deployment 部署的 pod 数据卷是共享的,创建 3 个 pod 都是用的同一个数据卷,对外提供统一的服务。在实际的场景中,尤其是分布式应用,会部署多个实例,这些实例之间往往有依赖关系,例如主从关系、主备关系,这种应用称为“有状态”,例如 MySQL 主从、Etcd 集群。 <br />StatefulSet 控制器用于部署有状态应用,满足一些有状态场景的: <br />1)StatefulSet 管控的的每个 Pod 对象都有固定的主机名和专有存储卷,即便被重构后亦能保持不变 <br />2)Pod 有序的部署、扩容、删除和停止 <br />3)Pod 分配一个稳定的且唯一的网络标识 <br />4)Pod 分配一个独享的存储<br />控制器 Daemonset<br />DaemonSet 确保全部(或者一些)Node 上运行一个 Pod 的副本。当有 Node 加入集群时,也会为他们新增一个 Pod 。当有 Node 从集群移除时,这些 Pod 也会被回收。删除 DaemonSet 将会删除它创建的所有Pod。 <br />使用 DaemonSet 的一些典型用法: <br />1)运行集群存储 daemon,例如在每个 Node 上运行 glusterd、ceph。 <br />2)日志收集,比如 fluentd,logstash 等 <br />3)系统监控,比如 Prometheus Node Exporter,collectd,New Relic agent,Ganglia gmond 等 <br />4)系统程序,比如 kube-proxy, kube-dns, glusterd, ceph 等 <br />控制器 Job&CronJob<br />Job 负责批量处理任务:Job 创建一个或多个 Pod,并确保指定数量的 Pod 成功终止。Pod 成功完 成后,Job 将跟踪成功完成的情况。当达到指定的成功完成次数时,任务(即 Job)就完成了。删除 Job将清除其创建的 Pod。 一个简单的情况是创建一个 Job 对象,以便可靠地运行一个 Pod 来完成。如果第一个 Pod 发生故 障或被删除(例如,由于节点硬件故障或节点重启),则 Job 对象将启动一个新的 Pod。 <br />Cron Job :一个 CronJob 对象就像 crontab 文件中的一行。它用 Cron 格式进行编写,并周期性地在给定的调度时间执行 Job。在每次调度运行时间内大概会创建一个 Job 对象。我们之所以说 大概 ,是因为在特定的环境下可能会创建两个 Job,或者一个 Job 都没创建。<br />Service<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1650902507413-42864d68-0aee-4574-87d0-daaa89154c95.png#clientId=u8dcb8bc6-ba77-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=315&id=u1d664edc&margin=%5Bobject%20Object%5D&name=image.png&originHeight=315&originWidth=961&originalType=binary&ratio=1&rotation=0&showTitle=false&size=38623&status=done&style=none&taskId=uc262fde4-101c-461e-8964-abf7481fe10&title=&width=961)<br />在 kubernetes 中,Pod 是有生命周期的,Pod 重启 IP 就会发生变化。如果我们的服务都是将Pod 的 IP 地址写死,Pod 的挂掉或者重启,和刚才重启的 pod 相关联的其他服务将会找不到它所关联的 Pod,为了解决这个问题,在 kubernetes 中定义了 service 资源对象,Service 定义了一个服务访问的入口,客户端通过这个入口即可访问服务背后的应用集群实例,service 是一组 Pod 的逻辑集合,这一组 Pod 能够被 Service 访问到,通常是通过 Label Selector 实现的。<br />Ingress Controller<br />Ingress 可以把进入到集群内部的请求转发到集群中的一些服务上,从而可以把服务映射到集群外部。Ingress 能把集群内 Service 配置成外网能够访问的 URL,流量负载均衡,提供基于域名访问的虚拟主机等。 <br />Ingress Controller 可以理解为控制器,它通过不断的跟 Kubernetes API 交互,实时获取后端Service、Pod 的变化,比如新增、删除等,结合 Ingress 定义的规则生成配置,然后动态更新上边的 Nginx 或者 trafik 负载均衡器,并刷新使配置生效,来达到服务自动发现的作用。
  31. ---
  32. <a name="SJ3jd"></a>
  33. # 五、使用 kubeadm 搭建生产环境的多 master 节点 k8s 高可用集群
  34. <a name="Scc7W"></a>
  35. ## 5.1 kubeadm试用场景
  36. Kubeadmkubeadm 是官方提供的开源安装工具,是一个开源项目,团队投入大量精力在维护,Kubeadm 安装的 k8sk8s 所有组件都是以 pod 运行的,可以做到故障自恢复,不需要人为写脚本维护。目前应该是比较方便和推荐的,简单易用。 <br />二进制:二进制安装的 k8s,所有组件都是通过 systemd 维护的,二进制安装对于理解 k8s 架构和原理更好,调试也比较方便。但是不适合经常需要大批量安装 k8s 需求。
  37. ---
  38. <a name="Rd75S"></a>
  39. ## 5.2 初始化集群
  40. 参考docker初始化方法,文档1.2<br />添加阿里k8s源(所有机器都需要配置)
  41. ```bash
  42. tee /etc/yum.repos.d/kubernetes.repo <<-'EOF'
  43. [kubernetes]
  44. name=Kubernetes
  45. baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
  46. enabled=1
  47. gpgcheck=0
  48. EOF

修改docker配置文件,将内核驱动修改成systemd驱动

  1. tee /etc/docker/daemon.json << 'EOF'
  2. {
  3. "registry-mirrors":["https://vh3bm52y.mirror.aliyuncs.com","https://registry.docker-cn.com","https://docker.mirrors.ustc.edu.cn","https://dockerhub.azk8s.cn","http://hub-mirror.c.163.com"],"exec-opts":["native.cgroupdriver=systemd"]
  4. }
  5. EOF
  6. systemctl daemon-reload && systemctl restart docker
  7. systemctl status docker

关闭swap分区

  1. swapoff -a
  2. # 永久关闭,注释即可
  3. vim /etc/fstab

image.png
安装k8s组件(所有机器执行)

  1. yum install -y kubelet-1.23.1 kubeadm-1.23.1 kubectl-1.23.1
  2. systemctl enable --now kubelet

5.3 使用nginx+keepalived实现 apiserver 高可用

yum 安装nginx和apiserver (所有master节点)

  1. yum install nginx keepalived nginx-mod-stream -y

修改nginx配置文件

  1. vim /etc/nginx/nginx.conf
  1. user nginx;
  2. worker_processes auto;
  3. error_log /var/log/nginx/error.log;
  4. pid /run/nginx.pid;
  5. include /usr/share/nginx/modules/*.conf;
  6. events {
  7. worker_connections 1024;
  8. }
  9. # 四层负载均衡,为两台 Master apiserver 组件提供负载均衡
  10. stream {
  11. log_format main '$remote_addr $upstream_addr - [$time_local] $status
  12. $upstream_bytes_sent';
  13. access_log /var/log/nginx/k8s-access.log main;
  14. upstream k8s-apiserver {
  15. server 192.168.1.63:6443; # Master1 APISERVER IP:PORT
  16. server 192.168.1.64:6443; # Master2 APISERVER IP:PORT
  17. }
  18. server {
  19. listen 16443; # 由于 nginx 与 master 节点复用,这个监听端口不能是 6443,否则会冲突
  20. proxy_pass k8s-apiserver;
  21. }
  22. }
  23. http {
  24. log_format main '$remote_addr - $remote_user [$time_local] "$request" '
  25. '$status $body_bytes_sent "$http_referer" '
  26. '"$http_user_agent" "$http_x_forwarded_for"';
  27. access_log /var/log/nginx/access.log main;
  28. sendfile on;
  29. tcp_nopush on;
  30. tcp_nodelay on;
  31. keepalive_timeout 65;
  32. types_hash_max_size 2048;
  33. include /etc/nginx/mime.types;
  34. default_type application/octet-stream;
  35. server {
  36. listen 80 default_server;
  37. server_name _;
  38. location / {
  39. }
  40. }
  41. }
  1. nginx -t

修改keepalived配置文件

  1. vim /etc/keepalived/keepalived.conf
  1. global_defs {
  2. notification_email {
  3. acassen@firewall.loc
  4. failover@firewall.loc
  5. sysadmin@firewall.loc
  6. }
  7. notification_email_from Alexandre.Cassen@firewall.loc
  8. smtp_server 127.0.0.1
  9. smtp_connect_timeout 30
  10. router_id NGINX_MASTER
  11. }
  12. vrrp_script check_nginx {
  13. script "/etc/keepalived/check_nginx.sh"
  14. }
  15. vrrp_instance VI_1 {
  16. state MASTER
  17. interface ens33 # 修改为实际网卡名
  18. virtual_router_id 51 # VRRP 路由 ID 实例,每个实例是唯一的
  19. priority 100 # 优先级,备服务器设置 90
  20. advert_int 1 # 指定 VRRP 心跳包通告间隔时间,默认 1 秒
  21. authentication {
  22. auth_type PASS
  23. auth_pass 1111
  24. }
  25. # 虚拟 IP
  26. virtual_ipaddress {
  27. 192.168.1.199/24 # VIP
  28. }
  29. track_script {
  30. check_nginx
  31. }
  32. }
  1. global_defs {
  2. notification_email {
  3. acassen@firewall.loc
  4. failover@firewall.loc
  5. sysadmin@firewall.loc
  6. }
  7. notification_email_from Alexandre.Cassen@firewall.loc
  8. smtp_server 127.0.0.1
  9. smtp_connect_timeout 30
  10. router_id NGINX_BACKUP
  11. }
  12. vrrp_script check_nginx {
  13. script "/etc/keepalived/check_nginx.sh"
  14. }
  15. vrrp_instance VI_1 {
  16. state BACKUP
  17. interface ens33
  18. virtual_router_id 51 # VRRP 路由 ID 实例,每个实例是唯一的
  19. priority 90
  20. advert_int 1
  21. authentication {
  22. auth_type PASS
  23. auth_pass 1111
  24. }
  25. virtual_ipaddress {
  26. 192.168.1.199/24
  27. }
  28. track_script {
  29. check_nginx
  30. }
  31. }

配置nginx探活脚本

  1. vim /etc/keepalived/check_nginx.sh
  1. #!/bin/bash
  2. #1、判断 Nginx 是否存活
  3. counter=`ps -C nginx --no-header | wc -l`
  4. if [ $counter -eq 0 ]; then
  5. #2、如果不存活则尝试启动 Nginx
  6. service nginx start
  7. sleep 2
  8. #3、等待 2 秒后再次获取一次 Nginx 状态
  9. counter=`ps -C nginx --no-header | wc -l`
  10. #4、再次进行判断,如 Nginx 还不存活则停止 Keepalived,让地址进行漂移
  11. if [ $counter -eq 0 ]; then
  12. service keepalived stop
  13. fi
  14. fi
  1. chmod +x /etc/keepalived/check_nginx.sh

启动并测试

  1. systemctl enable nginx keepalived --now
  2. ip addr

注: 可以停止nginx服务用来测试ip是否会漂移


5.4 初始化集群

主节点随便一台编写k8s配置文件

  1. vim /opt/kubeadm-config.yaml
  1. apiVersion: kubeadm.k8s.io/v1beta2
  2. kind: ClusterConfiguration
  3. kubernetesVersion: v1.23.1
  4. controlPlaneEndpoint: 192.168.1.199:16443
  5. imageRepository: registry.aliyuncs.com/google_containers
  6. apiServer:
  7. certSANs:
  8. - 192.168.1.63
  9. - 192.168.1.64
  10. - 192.168.1.65
  11. - 192.168.1.66
  12. - 192.168.1.199
  13. networking:
  14. podSubnet: 10.244.0.0/16
  15. serviceSubnet: 10.10.0.0/16
  16. ---
  17. apiVersion: kubeproxy.config.k8s.io/v1alpha1
  18. kind: KubeProxyConfiguration
  19. mode: ipvs

初始化集群

  1. kubeadm init --config /opt/kubeadm-config.yaml --ignore-preflight-errors=SystemVerification
  2. mkdir -p $HOME/.kube
  3. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  4. sudo chown $(id -u):$(id -g) $HOME/.kube/config

5.5 其他节点加入集群

复制密钥到其他主节点

  1. # 其他主节点执行
  2. cd /root && mkdir -p /etc/kubernetes/pki/etcd &&mkdir -p ~/.kube/
  3. # 初始化主节点执行
  4. scp /etc/kubernetes/pki/ca.crt k8s-master2:/etc/kubernetes/pki/
  5. scp /etc/kubernetes/pki/ca.key k8s-master2:/etc/kubernetes/pki/
  6. scp /etc/kubernetes/pki/sa.key k8s-master2:/etc/kubernetes/pki/
  7. scp /etc/kubernetes/pki/sa.pub k8s-master2:/etc/kubernetes/pki/
  8. scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-master2:/etc/kubernetes/pki/
  9. scp /etc/kubernetes/pki/front-proxy-ca.key k8s-master2:/etc/kubernetes/pki/
  10. scp /etc/kubernetes/pki/etcd/ca.crt k8s-master2:/etc/kubernetes/pki/etcd/
  11. scp /etc/kubernetes/pki/etcd/ca.key k8s-master2:/etc/kubernetes/pki/etcd/

复制主节点加入命令
image.png

  1. kubeadm join 192.168.1.199:16443 --token kxpyo6.ac4xjsaxarnrs1bz \
  2. --discovery-token-ca-cert-hash sha256:9d662cd9e9b56cdf874293e58d3c698fb4a5f8acbe872db591b6471820231f2b \
  3. --control-plane
  1. mkdir -p $HOME/.kube
  2. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  3. sudo chown $(id -u):$(id -g) $HOME/.kube/config

子节点加入命令
image.png

  1. kubeadm join 192.168.1.199:16443 --token kxpyo6.ac4xjsaxarnrs1bz \
  2. --discovery-token-ca-cert-hash sha256:9d662cd9e9b56cdf874293e58d3c698fb4a5f8acbe872db591b6471820231f2b

测试节点是否加入

  1. kubectl get nodes

5.6 使用网络组件calico

  1. vim calico.yaml
  1. ---
  2. # Source: calico/templates/calico-config.yaml
  3. # This ConfigMap is used to configure a self-hosted Calico installation.
  4. kind: ConfigMap
  5. apiVersion: v1
  6. metadata:
  7. name: calico-config
  8. namespace: kube-system
  9. data:
  10. # Typha is disabled.
  11. typha_service_name: "none"
  12. # Configure the backend to use.
  13. calico_backend: "bird"
  14. # Configure the MTU to use for workload interfaces and tunnels.
  15. # By default, MTU is auto-detected, and explicitly setting this field should not be required.
  16. # You can override auto-detection by providing a non-zero value.
  17. veth_mtu: "0"
  18. # The CNI network configuration to install on each node. The special
  19. # values in this config will be automatically populated.
  20. cni_network_config: |-
  21. {
  22. "name": "k8s-pod-network",
  23. "cniVersion": "0.3.1",
  24. "plugins": [
  25. {
  26. "type": "calico",
  27. "log_level": "info",
  28. "log_file_path": "/var/log/calico/cni/cni.log",
  29. "datastore_type": "kubernetes",
  30. "nodename": "__KUBERNETES_NODE_NAME__",
  31. "mtu": __CNI_MTU__,
  32. "ipam": {
  33. "type": "calico-ipam"
  34. },
  35. "policy": {
  36. "type": "k8s"
  37. },
  38. "kubernetes": {
  39. "kubeconfig": "__KUBECONFIG_FILEPATH__"
  40. }
  41. },
  42. {
  43. "type": "portmap",
  44. "snat": true,
  45. "capabilities": {"portMappings": true}
  46. },
  47. {
  48. "type": "bandwidth",
  49. "capabilities": {"bandwidth": true}
  50. }
  51. ]
  52. }
  53. ---
  54. # Source: calico/templates/kdd-crds.yaml
  55. apiVersion: apiextensions.k8s.io/v1
  56. kind: CustomResourceDefinition
  57. metadata:
  58. name: bgpconfigurations.crd.projectcalico.org
  59. spec:
  60. group: crd.projectcalico.org
  61. names:
  62. kind: BGPConfiguration
  63. listKind: BGPConfigurationList
  64. plural: bgpconfigurations
  65. singular: bgpconfiguration
  66. scope: Cluster
  67. versions:
  68. - name: v1
  69. schema:
  70. openAPIV3Schema:
  71. description: BGPConfiguration contains the configuration for any BGP routing.
  72. properties:
  73. apiVersion:
  74. description: 'APIVersion defines the versioned schema of this representation
  75. of an object. Servers should convert recognized schemas to the latest
  76. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  77. type: string
  78. kind:
  79. description: 'Kind is a string value representing the REST resource this
  80. object represents. Servers may infer this from the endpoint the client
  81. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  82. type: string
  83. metadata:
  84. type: object
  85. spec:
  86. description: BGPConfigurationSpec contains the values of the BGP configuration.
  87. properties:
  88. asNumber:
  89. description: 'ASNumber is the default AS number used by a node. [Default:
  90. 64512]'
  91. format: int32
  92. type: integer
  93. communities:
  94. description: Communities is a list of BGP community values and their
  95. arbitrary names for tagging routes.
  96. items:
  97. description: Community contains standard or large community value
  98. and its name.
  99. properties:
  100. name:
  101. description: Name given to community value.
  102. type: string
  103. value:
  104. description: Value must be of format `aa:nn` or `aa:nn:mm`.
  105. For standard community use `aa:nn` format, where `aa` and
  106. `nn` are 16 bit number. For large community use `aa:nn:mm`
  107. format, where `aa`, `nn` and `mm` are 32 bit number. Where,
  108. `aa` is an AS Number, `nn` and `mm` are per-AS identifier.
  109. pattern: ^(\d+):(\d+)$|^(\d+):(\d+):(\d+)$
  110. type: string
  111. type: object
  112. type: array
  113. listenPort:
  114. description: ListenPort is the port where BGP protocol should listen.
  115. Defaults to 179
  116. maximum: 65535
  117. minimum: 1
  118. type: integer
  119. logSeverityScreen:
  120. description: 'LogSeverityScreen is the log severity above which logs
  121. are sent to the stdout. [Default: INFO]'
  122. type: string
  123. nodeToNodeMeshEnabled:
  124. description: 'NodeToNodeMeshEnabled sets whether full node to node
  125. BGP mesh is enabled. [Default: true]'
  126. type: boolean
  127. prefixAdvertisements:
  128. description: PrefixAdvertisements contains per-prefix advertisement
  129. configuration.
  130. items:
  131. description: PrefixAdvertisement configures advertisement properties
  132. for the specified CIDR.
  133. properties:
  134. cidr:
  135. description: CIDR for which properties should be advertised.
  136. type: string
  137. communities:
  138. description: Communities can be list of either community names
  139. already defined in `Specs.Communities` or community value
  140. of format `aa:nn` or `aa:nn:mm`. For standard community use
  141. `aa:nn` format, where `aa` and `nn` are 16 bit number. For
  142. large community use `aa:nn:mm` format, where `aa`, `nn` and
  143. `mm` are 32 bit number. Where,`aa` is an AS Number, `nn` and
  144. `mm` are per-AS identifier.
  145. items:
  146. type: string
  147. type: array
  148. type: object
  149. type: array
  150. serviceClusterIPs:
  151. description: ServiceClusterIPs are the CIDR blocks from which service
  152. cluster IPs are allocated. If specified, Calico will advertise these
  153. blocks, as well as any cluster IPs within them.
  154. items:
  155. description: ServiceClusterIPBlock represents a single allowed ClusterIP
  156. CIDR block.
  157. properties:
  158. cidr:
  159. type: string
  160. type: object
  161. type: array
  162. serviceExternalIPs:
  163. description: ServiceExternalIPs are the CIDR blocks for Kubernetes
  164. Service External IPs. Kubernetes Service ExternalIPs will only be
  165. advertised if they are within one of these blocks.
  166. items:
  167. description: ServiceExternalIPBlock represents a single allowed
  168. External IP CIDR block.
  169. properties:
  170. cidr:
  171. type: string
  172. type: object
  173. type: array
  174. serviceLoadBalancerIPs:
  175. description: ServiceLoadBalancerIPs are the CIDR blocks for Kubernetes
  176. Service LoadBalancer IPs. Kubernetes Service status.LoadBalancer.Ingress
  177. IPs will only be advertised if they are within one of these blocks.
  178. items:
  179. description: ServiceLoadBalancerIPBlock represents a single allowed
  180. LoadBalancer IP CIDR block.
  181. properties:
  182. cidr:
  183. type: string
  184. type: object
  185. type: array
  186. type: object
  187. type: object
  188. served: true
  189. storage: true
  190. status:
  191. acceptedNames:
  192. kind: ""
  193. plural: ""
  194. conditions: []
  195. storedVersions: []
  196. ---
  197. apiVersion: apiextensions.k8s.io/v1
  198. kind: CustomResourceDefinition
  199. metadata:
  200. name: bgppeers.crd.projectcalico.org
  201. spec:
  202. group: crd.projectcalico.org
  203. names:
  204. kind: BGPPeer
  205. listKind: BGPPeerList
  206. plural: bgppeers
  207. singular: bgppeer
  208. scope: Cluster
  209. versions:
  210. - name: v1
  211. schema:
  212. openAPIV3Schema:
  213. properties:
  214. apiVersion:
  215. description: 'APIVersion defines the versioned schema of this representation
  216. of an object. Servers should convert recognized schemas to the latest
  217. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  218. type: string
  219. kind:
  220. description: 'Kind is a string value representing the REST resource this
  221. object represents. Servers may infer this from the endpoint the client
  222. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  223. type: string
  224. metadata:
  225. type: object
  226. spec:
  227. description: BGPPeerSpec contains the specification for a BGPPeer resource.
  228. properties:
  229. asNumber:
  230. description: The AS Number of the peer.
  231. format: int32
  232. type: integer
  233. keepOriginalNextHop:
  234. description: Option to keep the original nexthop field when routes
  235. are sent to a BGP Peer. Setting "true" configures the selected BGP
  236. Peers node to use the "next hop keep;" instead of "next hop self;"(default)
  237. in the specific branch of the Node on "bird.cfg".
  238. type: boolean
  239. node:
  240. description: The node name identifying the Calico node instance that
  241. is targeted by this peer. If this is not set, and no nodeSelector
  242. is specified, then this BGP peer selects all nodes in the cluster.
  243. type: string
  244. nodeSelector:
  245. description: Selector for the nodes that should have this peering. When
  246. this is set, the Node field must be empty.
  247. type: string
  248. password:
  249. description: Optional BGP password for the peerings generated by this
  250. BGPPeer resource.
  251. properties:
  252. secretKeyRef:
  253. description: Selects a key of a secret in the node pod's namespace.
  254. properties:
  255. key:
  256. description: The key of the secret to select from. Must be
  257. a valid secret key.
  258. type: string
  259. name:
  260. description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
  261. TODO: Add other useful fields. apiVersion, kind, uid?'
  262. type: string
  263. optional:
  264. description: Specify whether the Secret or its key must be
  265. defined
  266. type: boolean
  267. required:
  268. - key
  269. type: object
  270. type: object
  271. peerIP:
  272. description: The IP address of the peer followed by an optional port
  273. number to peer with. If port number is given, format should be `[<IPv6>]:port`
  274. or `<IPv4>:<port>` for IPv4. If optional port number is not set,
  275. and this peer IP and ASNumber belongs to a calico/node with ListenPort
  276. set in BGPConfiguration, then we use that port to peer.
  277. type: string
  278. peerSelector:
  279. description: Selector for the remote nodes to peer with. When this
  280. is set, the PeerIP and ASNumber fields must be empty. For each
  281. peering between the local node and selected remote nodes, we configure
  282. an IPv4 peering if both ends have NodeBGPSpec.IPv4Address specified,
  283. and an IPv6 peering if both ends have NodeBGPSpec.IPv6Address specified. The
  284. remote AS number comes from the remote node's NodeBGPSpec.ASNumber,
  285. or the global default if that is not set.
  286. type: string
  287. sourceAddress:
  288. description: Specifies whether and how to configure a source address
  289. for the peerings generated by this BGPPeer resource. Default value
  290. "UseNodeIP" means to configure the node IP as the source address. "None"
  291. means not to configure a source address.
  292. type: string
  293. type: object
  294. type: object
  295. served: true
  296. storage: true
  297. status:
  298. acceptedNames:
  299. kind: ""
  300. plural: ""
  301. conditions: []
  302. storedVersions: []
  303. ---
  304. apiVersion: apiextensions.k8s.io/v1
  305. kind: CustomResourceDefinition
  306. metadata:
  307. name: blockaffinities.crd.projectcalico.org
  308. spec:
  309. group: crd.projectcalico.org
  310. names:
  311. kind: BlockAffinity
  312. listKind: BlockAffinityList
  313. plural: blockaffinities
  314. singular: blockaffinity
  315. scope: Cluster
  316. versions:
  317. - name: v1
  318. schema:
  319. openAPIV3Schema:
  320. properties:
  321. apiVersion:
  322. description: 'APIVersion defines the versioned schema of this representation
  323. of an object. Servers should convert recognized schemas to the latest
  324. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  325. type: string
  326. kind:
  327. description: 'Kind is a string value representing the REST resource this
  328. object represents. Servers may infer this from the endpoint the client
  329. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  330. type: string
  331. metadata:
  332. type: object
  333. spec:
  334. description: BlockAffinitySpec contains the specification for a BlockAffinity
  335. resource.
  336. properties:
  337. cidr:
  338. type: string
  339. deleted:
  340. description: Deleted indicates that this block affinity is being deleted.
  341. This field is a string for compatibility with older releases that
  342. mistakenly treat this field as a string.
  343. type: string
  344. node:
  345. type: string
  346. state:
  347. type: string
  348. required:
  349. - cidr
  350. - deleted
  351. - node
  352. - state
  353. type: object
  354. type: object
  355. served: true
  356. storage: true
  357. status:
  358. acceptedNames:
  359. kind: ""
  360. plural: ""
  361. conditions: []
  362. storedVersions: []
  363. ---
  364. apiVersion: apiextensions.k8s.io/v1
  365. kind: CustomResourceDefinition
  366. metadata:
  367. name: clusterinformations.crd.projectcalico.org
  368. spec:
  369. group: crd.projectcalico.org
  370. names:
  371. kind: ClusterInformation
  372. listKind: ClusterInformationList
  373. plural: clusterinformations
  374. singular: clusterinformation
  375. scope: Cluster
  376. versions:
  377. - name: v1
  378. schema:
  379. openAPIV3Schema:
  380. description: ClusterInformation contains the cluster specific information.
  381. properties:
  382. apiVersion:
  383. description: 'APIVersion defines the versioned schema of this representation
  384. of an object. Servers should convert recognized schemas to the latest
  385. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  386. type: string
  387. kind:
  388. description: 'Kind is a string value representing the REST resource this
  389. object represents. Servers may infer this from the endpoint the client
  390. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  391. type: string
  392. metadata:
  393. type: object
  394. spec:
  395. description: ClusterInformationSpec contains the values of describing
  396. the cluster.
  397. properties:
  398. calicoVersion:
  399. description: CalicoVersion is the version of Calico that the cluster
  400. is running
  401. type: string
  402. clusterGUID:
  403. description: ClusterGUID is the GUID of the cluster
  404. type: string
  405. clusterType:
  406. description: ClusterType describes the type of the cluster
  407. type: string
  408. datastoreReady:
  409. description: DatastoreReady is used during significant datastore migrations
  410. to signal to components such as Felix that it should wait before
  411. accessing the datastore.
  412. type: boolean
  413. variant:
  414. description: Variant declares which variant of Calico should be active.
  415. type: string
  416. type: object
  417. type: object
  418. served: true
  419. storage: true
  420. status:
  421. acceptedNames:
  422. kind: ""
  423. plural: ""
  424. conditions: []
  425. storedVersions: []
  426. ---
  427. apiVersion: apiextensions.k8s.io/v1
  428. kind: CustomResourceDefinition
  429. metadata:
  430. name: felixconfigurations.crd.projectcalico.org
  431. spec:
  432. group: crd.projectcalico.org
  433. names:
  434. kind: FelixConfiguration
  435. listKind: FelixConfigurationList
  436. plural: felixconfigurations
  437. singular: felixconfiguration
  438. scope: Cluster
  439. versions:
  440. - name: v1
  441. schema:
  442. openAPIV3Schema:
  443. description: Felix Configuration contains the configuration for Felix.
  444. properties:
  445. apiVersion:
  446. description: 'APIVersion defines the versioned schema of this representation
  447. of an object. Servers should convert recognized schemas to the latest
  448. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  449. type: string
  450. kind:
  451. description: 'Kind is a string value representing the REST resource this
  452. object represents. Servers may infer this from the endpoint the client
  453. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  454. type: string
  455. metadata:
  456. type: object
  457. spec:
  458. description: FelixConfigurationSpec contains the values of the Felix configuration.
  459. properties:
  460. allowIPIPPacketsFromWorkloads:
  461. description: 'AllowIPIPPacketsFromWorkloads controls whether Felix
  462. will add a rule to drop IPIP encapsulated traffic from workloads
  463. [Default: false]'
  464. type: boolean
  465. allowVXLANPacketsFromWorkloads:
  466. description: 'AllowVXLANPacketsFromWorkloads controls whether Felix
  467. will add a rule to drop VXLAN encapsulated traffic from workloads
  468. [Default: false]'
  469. type: boolean
  470. awsSrcDstCheck:
  471. description: 'Set source-destination-check on AWS EC2 instances. Accepted
  472. value must be one of "DoNothing", "Enabled" or "Disabled". [Default:
  473. DoNothing]'
  474. enum:
  475. - DoNothing
  476. - Enable
  477. - Disable
  478. type: string
  479. bpfConnectTimeLoadBalancingEnabled:
  480. description: 'BPFConnectTimeLoadBalancingEnabled when in BPF mode,
  481. controls whether Felix installs the connection-time load balancer. The
  482. connect-time load balancer is required for the host to be able to
  483. reach Kubernetes services and it improves the performance of pod-to-service
  484. connections. The only reason to disable it is for debugging purposes. [Default:
  485. true]'
  486. type: boolean
  487. bpfDataIfacePattern:
  488. description: BPFDataIfacePattern is a regular expression that controls
  489. which interfaces Felix should attach BPF programs to in order to
  490. catch traffic to/from the network. This needs to match the interfaces
  491. that Calico workload traffic flows over as well as any interfaces
  492. that handle incoming traffic to nodeports and services from outside
  493. the cluster. It should not match the workload interfaces (usually
  494. named cali...).
  495. type: string
  496. bpfDisableUnprivileged:
  497. description: 'BPFDisableUnprivileged, if enabled, Felix sets the kernel.unprivileged_bpf_disabled
  498. sysctl to disable unprivileged use of BPF. This ensures that unprivileged
  499. users cannot access Calico''s BPF maps and cannot insert their own
  500. BPF programs to interfere with Calico''s. [Default: true]'
  501. type: boolean
  502. bpfEnabled:
  503. description: 'BPFEnabled, if enabled Felix will use the BPF dataplane.
  504. [Default: false]'
  505. type: boolean
  506. bpfExternalServiceMode:
  507. description: 'BPFExternalServiceMode in BPF mode, controls how connections
  508. from outside the cluster to services (node ports and cluster IPs)
  509. are forwarded to remote workloads. If set to "Tunnel" then both
  510. request and response traffic is tunneled to the remote node. If
  511. set to "DSR", the request traffic is tunneled but the response traffic
  512. is sent directly from the remote node. In "DSR" mode, the remote
  513. node appears to use the IP of the ingress node; this requires a
  514. permissive L2 network. [Default: Tunnel]'
  515. type: string
  516. bpfKubeProxyEndpointSlicesEnabled:
  517. description: BPFKubeProxyEndpointSlicesEnabled in BPF mode, controls
  518. whether Felix's embedded kube-proxy accepts EndpointSlices or not.
  519. type: boolean
  520. bpfKubeProxyIptablesCleanupEnabled:
  521. description: 'BPFKubeProxyIptablesCleanupEnabled, if enabled in BPF
  522. mode, Felix will proactively clean up the upstream Kubernetes kube-proxy''s
  523. iptables chains. Should only be enabled if kube-proxy is not running. [Default:
  524. true]'
  525. type: boolean
  526. bpfKubeProxyMinSyncPeriod:
  527. description: 'BPFKubeProxyMinSyncPeriod, in BPF mode, controls the
  528. minimum time between updates to the dataplane for Felix''s embedded
  529. kube-proxy. Lower values give reduced set-up latency. Higher values
  530. reduce Felix CPU usage by batching up more work. [Default: 1s]'
  531. type: string
  532. bpfLogLevel:
  533. description: 'BPFLogLevel controls the log level of the BPF programs
  534. when in BPF dataplane mode. One of "Off", "Info", or "Debug". The
  535. logs are emitted to the BPF trace pipe, accessible with the command
  536. `tc exec bpf debug`. [Default: Off].'
  537. type: string
  538. chainInsertMode:
  539. description: 'ChainInsertMode controls whether Felix hooks the kernel''s
  540. top-level iptables chains by inserting a rule at the top of the
  541. chain or by appending a rule at the bottom. insert is the safe default
  542. since it prevents Calico''s rules from being bypassed. If you switch
  543. to append mode, be sure that the other rules in the chains signal
  544. acceptance by falling through to the Calico rules, otherwise the
  545. Calico policy will be bypassed. [Default: insert]'
  546. type: string
  547. dataplaneDriver:
  548. type: string
  549. debugDisableLogDropping:
  550. type: boolean
  551. debugMemoryProfilePath:
  552. type: string
  553. debugSimulateCalcGraphHangAfter:
  554. type: string
  555. debugSimulateDataplaneHangAfter:
  556. type: string
  557. defaultEndpointToHostAction:
  558. description: 'DefaultEndpointToHostAction controls what happens to
  559. traffic that goes from a workload endpoint to the host itself (after
  560. the traffic hits the endpoint egress policy). By default Calico
  561. blocks traffic from workload endpoints to the host itself with an
  562. iptables "DROP" action. If you want to allow some or all traffic
  563. from endpoint to host, set this parameter to RETURN or ACCEPT. Use
  564. RETURN if you have your own rules in the iptables "INPUT" chain;
  565. Calico will insert its rules at the top of that chain, then "RETURN"
  566. packets to the "INPUT" chain once it has completed processing workload
  567. endpoint egress policy. Use ACCEPT to unconditionally accept packets
  568. from workloads after processing workload endpoint egress policy.
  569. [Default: Drop]'
  570. type: string
  571. deviceRouteProtocol:
  572. description: This defines the route protocol added to programmed device
  573. routes, by default this will be RTPROT_BOOT when left blank.
  574. type: integer
  575. deviceRouteSourceAddress:
  576. description: This is the source address to use on programmed device
  577. routes. By default the source address is left blank, leaving the
  578. kernel to choose the source address used.
  579. type: string
  580. disableConntrackInvalidCheck:
  581. type: boolean
  582. endpointReportingDelay:
  583. type: string
  584. endpointReportingEnabled:
  585. type: boolean
  586. externalNodesList:
  587. description: ExternalNodesCIDRList is a list of CIDR's of external-non-calico-nodes
  588. which may source tunnel traffic and have the tunneled traffic be
  589. accepted at calico nodes.
  590. items:
  591. type: string
  592. type: array
  593. failsafeInboundHostPorts:
  594. description: 'FailsafeInboundHostPorts is a comma-delimited list of
  595. UDP/TCP ports that Felix will allow incoming traffic to host endpoints
  596. on irrespective of the security policy. This is useful to avoid
  597. accidentally cutting off a host with incorrect configuration. Each
  598. port should be specified as tcp:<port-number> or udp:<port-number>.
  599. For back-compatibility, if the protocol is not specified, it defaults
  600. to "tcp". To disable all inbound host ports, use the value none.
  601. The default value allows ssh access and DHCP. [Default: tcp:22,
  602. udp:68, tcp:179, tcp:2379, tcp:2380, tcp:6443, tcp:6666, tcp:6667]'
  603. items:
  604. description: ProtoPort is combination of protocol and port, both
  605. must be specified.
  606. properties:
  607. port:
  608. type: integer
  609. protocol:
  610. type: string
  611. required:
  612. - port
  613. - protocol
  614. type: object
  615. type: array
  616. failsafeOutboundHostPorts:
  617. description: 'FailsafeOutboundHostPorts is a comma-delimited list
  618. of UDP/TCP ports that Felix will allow outgoing traffic from host
  619. endpoints to irrespective of the security policy. This is useful
  620. to avoid accidentally cutting off a host with incorrect configuration.
  621. Each port should be specified as tcp:<port-number> or udp:<port-number>.
  622. For back-compatibility, if the protocol is not specified, it defaults
  623. to "tcp". To disable all outbound host ports, use the value none.
  624. The default value opens etcd''s standard ports to ensure that Felix
  625. does not get cut off from etcd as well as allowing DHCP and DNS.
  626. [Default: tcp:179, tcp:2379, tcp:2380, tcp:6443, tcp:6666, tcp:6667,
  627. udp:53, udp:67]'
  628. items:
  629. description: ProtoPort is combination of protocol and port, both
  630. must be specified.
  631. properties:
  632. port:
  633. type: integer
  634. protocol:
  635. type: string
  636. required:
  637. - port
  638. - protocol
  639. type: object
  640. type: array
  641. featureDetectOverride:
  642. description: FeatureDetectOverride is used to override the feature
  643. detection. Values are specified in a comma separated list with no
  644. spaces, example; "SNATFullyRandom=true,MASQFullyRandom=false,RestoreSupportsLock=".
  645. "true" or "false" will force the feature, empty or omitted values
  646. are auto-detected.
  647. type: string
  648. genericXDPEnabled:
  649. description: 'GenericXDPEnabled enables Generic XDP so network cards
  650. that don''t support XDP offload or driver modes can use XDP. This
  651. is not recommended since it doesn''t provide better performance
  652. than iptables. [Default: false]'
  653. type: boolean
  654. healthEnabled:
  655. type: boolean
  656. healthHost:
  657. type: string
  658. healthPort:
  659. type: integer
  660. interfaceExclude:
  661. description: 'InterfaceExclude is a comma-separated list of interfaces
  662. that Felix should exclude when monitoring for host endpoints. The
  663. default value ensures that Felix ignores Kubernetes'' IPVS dummy
  664. interface, which is used internally by kube-proxy. If you want to
  665. exclude multiple interface names using a single value, the list
  666. supports regular expressions. For regular expressions you must wrap
  667. the value with ''/''. For example having values ''/^kube/,veth1''
  668. will exclude all interfaces that begin with ''kube'' and also the
  669. interface ''veth1''. [Default: kube-ipvs0]'
  670. type: string
  671. interfacePrefix:
  672. description: 'InterfacePrefix is the interface name prefix that identifies
  673. workload endpoints and so distinguishes them from host endpoint
  674. interfaces. Note: in environments other than bare metal, the orchestrators
  675. configure this appropriately. For example our Kubernetes and Docker
  676. integrations set the ''cali'' value, and our OpenStack integration
  677. sets the ''tap'' value. [Default: cali]'
  678. type: string
  679. interfaceRefreshInterval:
  680. description: InterfaceRefreshInterval is the period at which Felix
  681. rescans local interfaces to verify their state. The rescan can be
  682. disabled by setting the interval to 0.
  683. type: string
  684. ipipEnabled:
  685. type: boolean
  686. ipipMTU:
  687. description: 'IPIPMTU is the MTU to set on the tunnel device. See
  688. Configuring MTU [Default: 1440]'
  689. type: integer
  690. ipsetsRefreshInterval:
  691. description: 'IpsetsRefreshInterval is the period at which Felix re-checks
  692. all iptables state to ensure that no other process has accidentally
  693. broken Calico''s rules. Set to 0 to disable iptables refresh. [Default:
  694. 90s]'
  695. type: string
  696. iptablesBackend:
  697. description: IptablesBackend specifies which backend of iptables will
  698. be used. The default is legacy.
  699. type: string
  700. iptablesFilterAllowAction:
  701. type: string
  702. iptablesLockFilePath:
  703. description: 'IptablesLockFilePath is the location of the iptables
  704. lock file. You may need to change this if the lock file is not in
  705. its standard location (for example if you have mapped it into Felix''s
  706. container at a different path). [Default: /run/xtables.lock]'
  707. type: string
  708. iptablesLockProbeInterval:
  709. description: 'IptablesLockProbeInterval is the time that Felix will
  710. wait between attempts to acquire the iptables lock if it is not
  711. available. Lower values make Felix more responsive when the lock
  712. is contended, but use more CPU. [Default: 50ms]'
  713. type: string
  714. iptablesLockTimeout:
  715. description: 'IptablesLockTimeout is the time that Felix will wait
  716. for the iptables lock, or 0, to disable. To use this feature, Felix
  717. must share the iptables lock file with all other processes that
  718. also take the lock. When running Felix inside a container, this
  719. requires the /run directory of the host to be mounted into the calico/node
  720. or calico/felix container. [Default: 0s disabled]'
  721. type: string
  722. iptablesMangleAllowAction:
  723. type: string
  724. iptablesMarkMask:
  725. description: 'IptablesMarkMask is the mask that Felix selects its
  726. IPTables Mark bits from. Should be a 32 bit hexadecimal number with
  727. at least 8 bits set, none of which clash with any other mark bits
  728. in use on the system. [Default: 0xff000000]'
  729. format: int32
  730. type: integer
  731. iptablesNATOutgoingInterfaceFilter:
  732. type: string
  733. iptablesPostWriteCheckInterval:
  734. description: 'IptablesPostWriteCheckInterval is the period after Felix
  735. has done a write to the dataplane that it schedules an extra read
  736. back in order to check the write was not clobbered by another process.
  737. This should only occur if another application on the system doesn''t
  738. respect the iptables lock. [Default: 1s]'
  739. type: string
  740. iptablesRefreshInterval:
  741. description: 'IptablesRefreshInterval is the period at which Felix
  742. re-checks the IP sets in the dataplane to ensure that no other process
  743. has accidentally broken Calico''s rules. Set to 0 to disable IP
  744. sets refresh. Note: the default for this value is lower than the
  745. other refresh intervals as a workaround for a Linux kernel bug that
  746. was fixed in kernel version 4.11. If you are using v4.11 or greater
  747. you may want to set this to, a higher value to reduce Felix CPU
  748. usage. [Default: 10s]'
  749. type: string
  750. ipv6Support:
  751. type: boolean
  752. kubeNodePortRanges:
  753. description: 'KubeNodePortRanges holds list of port ranges used for
  754. service node ports. Only used if felix detects kube-proxy running
  755. in ipvs mode. Felix uses these ranges to separate host and workload
  756. traffic. [Default: 30000:32767].'
  757. items:
  758. anyOf:
  759. - type: integer
  760. - type: string
  761. pattern: ^.*
  762. x-kubernetes-int-or-string: true
  763. type: array
  764. logFilePath:
  765. description: 'LogFilePath is the full path to the Felix log. Set to
  766. none to disable file logging. [Default: /var/log/calico/felix.log]'
  767. type: string
  768. logPrefix:
  769. description: 'LogPrefix is the log prefix that Felix uses when rendering
  770. LOG rules. [Default: calico-packet]'
  771. type: string
  772. logSeverityFile:
  773. description: 'LogSeverityFile is the log severity above which logs
  774. are sent to the log file. [Default: Info]'
  775. type: string
  776. logSeverityScreen:
  777. description: 'LogSeverityScreen is the log severity above which logs
  778. are sent to the stdout. [Default: Info]'
  779. type: string
  780. logSeveritySys:
  781. description: 'LogSeveritySys is the log severity above which logs
  782. are sent to the syslog. Set to None for no logging to syslog. [Default:
  783. Info]'
  784. type: string
  785. maxIpsetSize:
  786. type: integer
  787. metadataAddr:
  788. description: 'MetadataAddr is the IP address or domain name of the
  789. server that can answer VM queries for cloud-init metadata. In OpenStack,
  790. this corresponds to the machine running nova-api (or in Ubuntu,
  791. nova-api-metadata). A value of none (case insensitive) means that
  792. Felix should not set up any NAT rule for the metadata path. [Default:
  793. 127.0.0.1]'
  794. type: string
  795. metadataPort:
  796. description: 'MetadataPort is the port of the metadata server. This,
  797. combined with global.MetadataAddr (if not ''None''), is used to
  798. set up a NAT rule, from 169.254.169.254:80 to MetadataAddr:MetadataPort.
  799. In most cases this should not need to be changed [Default: 8775].'
  800. type: integer
  801. mtuIfacePattern:
  802. description: MTUIfacePattern is a regular expression that controls
  803. which interfaces Felix should scan in order to calculate the host's
  804. MTU. This should not match workload interfaces (usually named cali...).
  805. type: string
  806. natOutgoingAddress:
  807. description: NATOutgoingAddress specifies an address to use when performing
  808. source NAT for traffic in a natOutgoing pool that is leaving the
  809. network. By default the address used is an address on the interface
  810. the traffic is leaving on (ie it uses the iptables MASQUERADE target)
  811. type: string
  812. natPortRange:
  813. anyOf:
  814. - type: integer
  815. - type: string
  816. description: NATPortRange specifies the range of ports that is used
  817. for port mapping when doing outgoing NAT. When unset the default
  818. behavior of the network stack is used.
  819. pattern: ^.*
  820. x-kubernetes-int-or-string: true
  821. netlinkTimeout:
  822. type: string
  823. openstackRegion:
  824. description: 'OpenstackRegion is the name of the region that a particular
  825. Felix belongs to. In a multi-region Calico/OpenStack deployment,
  826. this must be configured somehow for each Felix (here in the datamodel,
  827. or in felix.cfg or the environment on each compute node), and must
  828. match the [calico] openstack_region value configured in neutron.conf
  829. on each node. [Default: Empty]'
  830. type: string
  831. policySyncPathPrefix:
  832. description: 'PolicySyncPathPrefix is used to by Felix to communicate
  833. policy changes to external services, like Application layer policy.
  834. [Default: Empty]'
  835. type: string
  836. prometheusGoMetricsEnabled:
  837. description: 'PrometheusGoMetricsEnabled disables Go runtime metrics
  838. collection, which the Prometheus client does by default, when set
  839. to false. This reduces the number of metrics reported, reducing
  840. Prometheus load. [Default: true]'
  841. type: boolean
  842. prometheusMetricsEnabled:
  843. description: 'PrometheusMetricsEnabled enables the Prometheus metrics
  844. server in Felix if set to true. [Default: false]'
  845. type: boolean
  846. prometheusMetricsHost:
  847. description: 'PrometheusMetricsHost is the host that the Prometheus
  848. metrics server should bind to. [Default: empty]'
  849. type: string
  850. prometheusMetricsPort:
  851. description: 'PrometheusMetricsPort is the TCP port that the Prometheus
  852. metrics server should bind to. [Default: 9091]'
  853. type: integer
  854. prometheusProcessMetricsEnabled:
  855. description: 'PrometheusProcessMetricsEnabled disables process metrics
  856. collection, which the Prometheus client does by default, when set
  857. to false. This reduces the number of metrics reported, reducing
  858. Prometheus load. [Default: true]'
  859. type: boolean
  860. removeExternalRoutes:
  861. description: Whether or not to remove device routes that have not
  862. been programmed by Felix. Disabling this will allow external applications
  863. to also add device routes. This is enabled by default which means
  864. we will remove externally added routes.
  865. type: boolean
  866. reportingInterval:
  867. description: 'ReportingInterval is the interval at which Felix reports
  868. its status into the datastore or 0 to disable. Must be non-zero
  869. in OpenStack deployments. [Default: 30s]'
  870. type: string
  871. reportingTTL:
  872. description: 'ReportingTTL is the time-to-live setting for process-wide
  873. status reports. [Default: 90s]'
  874. type: string
  875. routeRefreshInterval:
  876. description: 'RouteRefreshInterval is the period at which Felix re-checks
  877. the routes in the dataplane to ensure that no other process has
  878. accidentally broken Calico''s rules. Set to 0 to disable route refresh.
  879. [Default: 90s]'
  880. type: string
  881. routeSource:
  882. description: 'RouteSource configures where Felix gets its routing
  883. information. - WorkloadIPs: use workload endpoints to construct
  884. routes. - CalicoIPAM: the default - use IPAM data to construct routes.'
  885. type: string
  886. routeTableRange:
  887. description: Calico programs additional Linux route tables for various
  888. purposes. RouteTableRange specifies the indices of the route tables
  889. that Calico should use.
  890. properties:
  891. max:
  892. type: integer
  893. min:
  894. type: integer
  895. required:
  896. - max
  897. - min
  898. type: object
  899. serviceLoopPrevention:
  900. description: 'When service IP advertisement is enabled, prevent routing
  901. loops to service IPs that are not in use, by dropping or rejecting
  902. packets that do not get DNAT''d by kube-proxy. Unless set to "Disabled",
  903. in which case such routing loops continue to be allowed. [Default:
  904. Drop]'
  905. type: string
  906. sidecarAccelerationEnabled:
  907. description: 'SidecarAccelerationEnabled enables experimental sidecar
  908. acceleration [Default: false]'
  909. type: boolean
  910. usageReportingEnabled:
  911. description: 'UsageReportingEnabled reports anonymous Calico version
  912. number and cluster size to projectcalico.org. Logs warnings returned
  913. by the usage server. For example, if a significant security vulnerability
  914. has been discovered in the version of Calico being used. [Default:
  915. true]'
  916. type: boolean
  917. usageReportingInitialDelay:
  918. description: 'UsageReportingInitialDelay controls the minimum delay
  919. before Felix makes a report. [Default: 300s]'
  920. type: string
  921. usageReportingInterval:
  922. description: 'UsageReportingInterval controls the interval at which
  923. Felix makes reports. [Default: 86400s]'
  924. type: string
  925. useInternalDataplaneDriver:
  926. type: boolean
  927. vxlanEnabled:
  928. type: boolean
  929. vxlanMTU:
  930. description: 'VXLANMTU is the MTU to set on the tunnel device. See
  931. Configuring MTU [Default: 1440]'
  932. type: integer
  933. vxlanPort:
  934. type: integer
  935. vxlanVNI:
  936. type: integer
  937. wireguardEnabled:
  938. description: 'WireguardEnabled controls whether Wireguard is enabled.
  939. [Default: false]'
  940. type: boolean
  941. wireguardInterfaceName:
  942. description: 'WireguardInterfaceName specifies the name to use for
  943. the Wireguard interface. [Default: wg.calico]'
  944. type: string
  945. wireguardListeningPort:
  946. description: 'WireguardListeningPort controls the listening port used
  947. by Wireguard. [Default: 51820]'
  948. type: integer
  949. wireguardMTU:
  950. description: 'WireguardMTU controls the MTU on the Wireguard interface.
  951. See Configuring MTU [Default: 1420]'
  952. type: integer
  953. wireguardRoutingRulePriority:
  954. description: 'WireguardRoutingRulePriority controls the priority value
  955. to use for the Wireguard routing rule. [Default: 99]'
  956. type: integer
  957. xdpEnabled:
  958. description: 'XDPEnabled enables XDP acceleration for suitable untracked
  959. incoming deny rules. [Default: true]'
  960. type: boolean
  961. xdpRefreshInterval:
  962. description: 'XDPRefreshInterval is the period at which Felix re-checks
  963. all XDP state to ensure that no other process has accidentally broken
  964. Calico''s BPF maps or attached programs. Set to 0 to disable XDP
  965. refresh. [Default: 90s]'
  966. type: string
  967. type: object
  968. type: object
  969. served: true
  970. storage: true
  971. status:
  972. acceptedNames:
  973. kind: ""
  974. plural: ""
  975. conditions: []
  976. storedVersions: []
  977. ---
  978. apiVersion: apiextensions.k8s.io/v1
  979. kind: CustomResourceDefinition
  980. metadata:
  981. name: globalnetworkpolicies.crd.projectcalico.org
  982. spec:
  983. group: crd.projectcalico.org
  984. names:
  985. kind: GlobalNetworkPolicy
  986. listKind: GlobalNetworkPolicyList
  987. plural: globalnetworkpolicies
  988. singular: globalnetworkpolicy
  989. scope: Cluster
  990. versions:
  991. - name: v1
  992. schema:
  993. openAPIV3Schema:
  994. properties:
  995. apiVersion:
  996. description: 'APIVersion defines the versioned schema of this representation
  997. of an object. Servers should convert recognized schemas to the latest
  998. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  999. type: string
  1000. kind:
  1001. description: 'Kind is a string value representing the REST resource this
  1002. object represents. Servers may infer this from the endpoint the client
  1003. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  1004. type: string
  1005. metadata:
  1006. type: object
  1007. spec:
  1008. properties:
  1009. applyOnForward:
  1010. description: ApplyOnForward indicates to apply the rules in this policy
  1011. on forward traffic.
  1012. type: boolean
  1013. doNotTrack:
  1014. description: DoNotTrack indicates whether packets matched by the rules
  1015. in this policy should go through the data plane's connection tracking,
  1016. such as Linux conntrack. If True, the rules in this policy are
  1017. applied before any data plane connection tracking, and packets allowed
  1018. by this policy are marked as not to be tracked.
  1019. type: boolean
  1020. egress:
  1021. description: The ordered set of egress rules. Each rule contains
  1022. a set of packet match criteria and a corresponding action to apply.
  1023. items:
  1024. description: "A Rule encapsulates a set of match criteria and an
  1025. action. Both selector-based security Policy and security Profiles
  1026. reference rules - separated out as a list of rules for both ingress
  1027. and egress packet matching. \n Each positive match criteria has
  1028. a negated version, prefixed with \"Not\". All the match criteria
  1029. within a rule must be satisfied for a packet to match. A single
  1030. rule can contain the positive and negative version of a match
  1031. and both must be satisfied for the rule to match."
  1032. properties:
  1033. action:
  1034. type: string
  1035. destination:
  1036. description: Destination contains the match criteria that apply
  1037. to destination entity.
  1038. properties:
  1039. namespaceSelector:
  1040. description: "NamespaceSelector is an optional field that
  1041. contains a selector expression. Only traffic that originates
  1042. from (or terminates at) endpoints within the selected
  1043. namespaces will be matched. When both NamespaceSelector
  1044. and Selector are defined on the same rule, then only workload
  1045. endpoints that are matched by both selectors will be selected
  1046. by the rule. \n For NetworkPolicy, an empty NamespaceSelector
  1047. implies that the Selector is limited to selecting only
  1048. workload endpoints in the same namespace as the NetworkPolicy.
  1049. \n For NetworkPolicy, `global()` NamespaceSelector implies
  1050. that the Selector is limited to selecting only GlobalNetworkSet
  1051. or HostEndpoint. \n For GlobalNetworkPolicy, an empty
  1052. NamespaceSelector implies the Selector applies to workload
  1053. endpoints across all namespaces."
  1054. type: string
  1055. nets:
  1056. description: Nets is an optional field that restricts the
  1057. rule to only apply to traffic that originates from (or
  1058. terminates at) IP addresses in any of the given subnets.
  1059. items:
  1060. type: string
  1061. type: array
  1062. notNets:
  1063. description: NotNets is the negated version of the Nets
  1064. field.
  1065. items:
  1066. type: string
  1067. type: array
  1068. notPorts:
  1069. description: NotPorts is the negated version of the Ports
  1070. field. Since only some protocols have ports, if any ports
  1071. are specified it requires the Protocol match in the Rule
  1072. to be set to "TCP" or "UDP".
  1073. items:
  1074. anyOf:
  1075. - type: integer
  1076. - type: string
  1077. pattern: ^.*
  1078. x-kubernetes-int-or-string: true
  1079. type: array
  1080. notSelector:
  1081. description: NotSelector is the negated version of the Selector
  1082. field. See Selector field for subtleties with negated
  1083. selectors.
  1084. type: string
  1085. ports:
  1086. description: "Ports is an optional field that restricts
  1087. the rule to only apply to traffic that has a source (destination)
  1088. port that matches one of these ranges/values. This value
  1089. is a list of integers or strings that represent ranges
  1090. of ports. \n Since only some protocols have ports, if
  1091. any ports are specified it requires the Protocol match
  1092. in the Rule to be set to \"TCP\" or \"UDP\"."
  1093. items:
  1094. anyOf:
  1095. - type: integer
  1096. - type: string
  1097. pattern: ^.*
  1098. x-kubernetes-int-or-string: true
  1099. type: array
  1100. selector:
  1101. description: "Selector is an optional field that contains
  1102. a selector expression (see Policy for sample syntax).
  1103. \ Only traffic that originates from (terminates at) endpoints
  1104. matching the selector will be matched. \n Note that: in
  1105. addition to the negated version of the Selector (see NotSelector
  1106. below), the selector expression syntax itself supports
  1107. negation. The two types of negation are subtly different.
  1108. One negates the set of matched endpoints, the other negates
  1109. the whole match: \n \tSelector = \"!has(my_label)\" matches
  1110. packets that are from other Calico-controlled \tendpoints
  1111. that do not have the label \"my_label\". \n \tNotSelector
  1112. = \"has(my_label)\" matches packets that are not from
  1113. Calico-controlled \tendpoints that do have the label \"my_label\".
  1114. \n The effect is that the latter will accept packets from
  1115. non-Calico sources whereas the former is limited to packets
  1116. from Calico-controlled endpoints."
  1117. type: string
  1118. serviceAccounts:
  1119. description: ServiceAccounts is an optional field that restricts
  1120. the rule to only apply to traffic that originates from
  1121. (or terminates at) a pod running as a matching service
  1122. account.
  1123. properties:
  1124. names:
  1125. description: Names is an optional field that restricts
  1126. the rule to only apply to traffic that originates
  1127. from (or terminates at) a pod running as a service
  1128. account whose name is in the list.
  1129. items:
  1130. type: string
  1131. type: array
  1132. selector:
  1133. description: Selector is an optional field that restricts
  1134. the rule to only apply to traffic that originates
  1135. from (or terminates at) a pod running as a service
  1136. account that matches the given label selector. If
  1137. both Names and Selector are specified then they are
  1138. AND'ed.
  1139. type: string
  1140. type: object
  1141. type: object
  1142. http:
  1143. description: HTTP contains match criteria that apply to HTTP
  1144. requests.
  1145. properties:
  1146. methods:
  1147. description: Methods is an optional field that restricts
  1148. the rule to apply only to HTTP requests that use one of
  1149. the listed HTTP Methods (e.g. GET, PUT, etc.) Multiple
  1150. methods are OR'd together.
  1151. items:
  1152. type: string
  1153. type: array
  1154. paths:
  1155. description: 'Paths is an optional field that restricts
  1156. the rule to apply to HTTP requests that use one of the
  1157. listed HTTP Paths. Multiple paths are OR''d together.
  1158. e.g: - exact: /foo - prefix: /bar NOTE: Each entry may
  1159. ONLY specify either a `exact` or a `prefix` match. The
  1160. validator will check for it.'
  1161. items:
  1162. description: 'HTTPPath specifies an HTTP path to match.
  1163. It may be either of the form: exact: <path>: which matches
  1164. the path exactly or prefix: <path-prefix>: which matches
  1165. the path prefix'
  1166. properties:
  1167. exact:
  1168. type: string
  1169. prefix:
  1170. type: string
  1171. type: object
  1172. type: array
  1173. type: object
  1174. icmp:
  1175. description: ICMP is an optional field that restricts the rule
  1176. to apply to a specific type and code of ICMP traffic. This
  1177. should only be specified if the Protocol field is set to "ICMP"
  1178. or "ICMPv6".
  1179. properties:
  1180. code:
  1181. description: Match on a specific ICMP code. If specified,
  1182. the Type value must also be specified. This is a technical
  1183. limitation imposed by the kernel's iptables firewall,
  1184. which Calico uses to enforce the rule.
  1185. type: integer
  1186. type:
  1187. description: Match on a specific ICMP type. For example
  1188. a value of 8 refers to ICMP Echo Request (i.e. pings).
  1189. type: integer
  1190. type: object
  1191. ipVersion:
  1192. description: IPVersion is an optional field that restricts the
  1193. rule to only match a specific IP version.
  1194. type: integer
  1195. metadata:
  1196. description: Metadata contains additional information for this
  1197. rule
  1198. properties:
  1199. annotations:
  1200. additionalProperties:
  1201. type: string
  1202. description: Annotations is a set of key value pairs that
  1203. give extra information about the rule
  1204. type: object
  1205. type: object
  1206. notICMP:
  1207. description: NotICMP is the negated version of the ICMP field.
  1208. properties:
  1209. code:
  1210. description: Match on a specific ICMP code. If specified,
  1211. the Type value must also be specified. This is a technical
  1212. limitation imposed by the kernel's iptables firewall,
  1213. which Calico uses to enforce the rule.
  1214. type: integer
  1215. type:
  1216. description: Match on a specific ICMP type. For example
  1217. a value of 8 refers to ICMP Echo Request (i.e. pings).
  1218. type: integer
  1219. type: object
  1220. notProtocol:
  1221. anyOf:
  1222. - type: integer
  1223. - type: string
  1224. description: NotProtocol is the negated version of the Protocol
  1225. field.
  1226. pattern: ^.*
  1227. x-kubernetes-int-or-string: true
  1228. protocol:
  1229. anyOf:
  1230. - type: integer
  1231. - type: string
  1232. description: "Protocol is an optional field that restricts the
  1233. rule to only apply to traffic of a specific IP protocol. Required
  1234. if any of the EntityRules contain Ports (because ports only
  1235. apply to certain protocols). \n Must be one of these string
  1236. values: \"TCP\", \"UDP\", \"ICMP\", \"ICMPv6\", \"SCTP\",
  1237. \"UDPLite\" or an integer in the range 1-255."
  1238. pattern: ^.*
  1239. x-kubernetes-int-or-string: true
  1240. source:
  1241. description: Source contains the match criteria that apply to
  1242. source entity.
  1243. properties:
  1244. namespaceSelector:
  1245. description: "NamespaceSelector is an optional field that
  1246. contains a selector expression. Only traffic that originates
  1247. from (or terminates at) endpoints within the selected
  1248. namespaces will be matched. When both NamespaceSelector
  1249. and Selector are defined on the same rule, then only workload
  1250. endpoints that are matched by both selectors will be selected
  1251. by the rule. \n For NetworkPolicy, an empty NamespaceSelector
  1252. implies that the Selector is limited to selecting only
  1253. workload endpoints in the same namespace as the NetworkPolicy.
  1254. \n For NetworkPolicy, `global()` NamespaceSelector implies
  1255. that the Selector is limited to selecting only GlobalNetworkSet
  1256. or HostEndpoint. \n For GlobalNetworkPolicy, an empty
  1257. NamespaceSelector implies the Selector applies to workload
  1258. endpoints across all namespaces."
  1259. type: string
  1260. nets:
  1261. description: Nets is an optional field that restricts the
  1262. rule to only apply to traffic that originates from (or
  1263. terminates at) IP addresses in any of the given subnets.
  1264. items:
  1265. type: string
  1266. type: array
  1267. notNets:
  1268. description: NotNets is the negated version of the Nets
  1269. field.
  1270. items:
  1271. type: string
  1272. type: array
  1273. notPorts:
  1274. description: NotPorts is the negated version of the Ports
  1275. field. Since only some protocols have ports, if any ports
  1276. are specified it requires the Protocol match in the Rule
  1277. to be set to "TCP" or "UDP".
  1278. items:
  1279. anyOf:
  1280. - type: integer
  1281. - type: string
  1282. pattern: ^.*
  1283. x-kubernetes-int-or-string: true
  1284. type: array
  1285. notSelector:
  1286. description: NotSelector is the negated version of the Selector
  1287. field. See Selector field for subtleties with negated
  1288. selectors.
  1289. type: string
  1290. ports:
  1291. description: "Ports is an optional field that restricts
  1292. the rule to only apply to traffic that has a source (destination)
  1293. port that matches one of these ranges/values. This value
  1294. is a list of integers or strings that represent ranges
  1295. of ports. \n Since only some protocols have ports, if
  1296. any ports are specified it requires the Protocol match
  1297. in the Rule to be set to \"TCP\" or \"UDP\"."
  1298. items:
  1299. anyOf:
  1300. - type: integer
  1301. - type: string
  1302. pattern: ^.*
  1303. x-kubernetes-int-or-string: true
  1304. type: array
  1305. selector:
  1306. description: "Selector is an optional field that contains
  1307. a selector expression (see Policy for sample syntax).
  1308. \ Only traffic that originates from (terminates at) endpoints
  1309. matching the selector will be matched. \n Note that: in
  1310. addition to the negated version of the Selector (see NotSelector
  1311. below), the selector expression syntax itself supports
  1312. negation. The two types of negation are subtly different.
  1313. One negates the set of matched endpoints, the other negates
  1314. the whole match: \n \tSelector = \"!has(my_label)\" matches
  1315. packets that are from other Calico-controlled \tendpoints
  1316. that do not have the label \"my_label\". \n \tNotSelector
  1317. = \"has(my_label)\" matches packets that are not from
  1318. Calico-controlled \tendpoints that do have the label \"my_label\".
  1319. \n The effect is that the latter will accept packets from
  1320. non-Calico sources whereas the former is limited to packets
  1321. from Calico-controlled endpoints."
  1322. type: string
  1323. serviceAccounts:
  1324. description: ServiceAccounts is an optional field that restricts
  1325. the rule to only apply to traffic that originates from
  1326. (or terminates at) a pod running as a matching service
  1327. account.
  1328. properties:
  1329. names:
  1330. description: Names is an optional field that restricts
  1331. the rule to only apply to traffic that originates
  1332. from (or terminates at) a pod running as a service
  1333. account whose name is in the list.
  1334. items:
  1335. type: string
  1336. type: array
  1337. selector:
  1338. description: Selector is an optional field that restricts
  1339. the rule to only apply to traffic that originates
  1340. from (or terminates at) a pod running as a service
  1341. account that matches the given label selector. If
  1342. both Names and Selector are specified then they are
  1343. AND'ed.
  1344. type: string
  1345. type: object
  1346. type: object
  1347. required:
  1348. - action
  1349. type: object
  1350. type: array
  1351. ingress:
  1352. description: The ordered set of ingress rules. Each rule contains
  1353. a set of packet match criteria and a corresponding action to apply.
  1354. items:
  1355. description: "A Rule encapsulates a set of match criteria and an
  1356. action. Both selector-based security Policy and security Profiles
  1357. reference rules - separated out as a list of rules for both ingress
  1358. and egress packet matching. \n Each positive match criteria has
  1359. a negated version, prefixed with \"Not\". All the match criteria
  1360. within a rule must be satisfied for a packet to match. A single
  1361. rule can contain the positive and negative version of a match
  1362. and both must be satisfied for the rule to match."
  1363. properties:
  1364. action:
  1365. type: string
  1366. destination:
  1367. description: Destination contains the match criteria that apply
  1368. to destination entity.
  1369. properties:
  1370. namespaceSelector:
  1371. description: "NamespaceSelector is an optional field that
  1372. contains a selector expression. Only traffic that originates
  1373. from (or terminates at) endpoints within the selected
  1374. namespaces will be matched. When both NamespaceSelector
  1375. and Selector are defined on the same rule, then only workload
  1376. endpoints that are matched by both selectors will be selected
  1377. by the rule. \n For NetworkPolicy, an empty NamespaceSelector
  1378. implies that the Selector is limited to selecting only
  1379. workload endpoints in the same namespace as the NetworkPolicy.
  1380. \n For NetworkPolicy, `global()` NamespaceSelector implies
  1381. that the Selector is limited to selecting only GlobalNetworkSet
  1382. or HostEndpoint. \n For GlobalNetworkPolicy, an empty
  1383. NamespaceSelector implies the Selector applies to workload
  1384. endpoints across all namespaces."
  1385. type: string
  1386. nets:
  1387. description: Nets is an optional field that restricts the
  1388. rule to only apply to traffic that originates from (or
  1389. terminates at) IP addresses in any of the given subnets.
  1390. items:
  1391. type: string
  1392. type: array
  1393. notNets:
  1394. description: NotNets is the negated version of the Nets
  1395. field.
  1396. items:
  1397. type: string
  1398. type: array
  1399. notPorts:
  1400. description: NotPorts is the negated version of the Ports
  1401. field. Since only some protocols have ports, if any ports
  1402. are specified it requires the Protocol match in the Rule
  1403. to be set to "TCP" or "UDP".
  1404. items:
  1405. anyOf:
  1406. - type: integer
  1407. - type: string
  1408. pattern: ^.*
  1409. x-kubernetes-int-or-string: true
  1410. type: array
  1411. notSelector:
  1412. description: NotSelector is the negated version of the Selector
  1413. field. See Selector field for subtleties with negated
  1414. selectors.
  1415. type: string
  1416. ports:
  1417. description: "Ports is an optional field that restricts
  1418. the rule to only apply to traffic that has a source (destination)
  1419. port that matches one of these ranges/values. This value
  1420. is a list of integers or strings that represent ranges
  1421. of ports. \n Since only some protocols have ports, if
  1422. any ports are specified it requires the Protocol match
  1423. in the Rule to be set to \"TCP\" or \"UDP\"."
  1424. items:
  1425. anyOf:
  1426. - type: integer
  1427. - type: string
  1428. pattern: ^.*
  1429. x-kubernetes-int-or-string: true
  1430. type: array
  1431. selector:
  1432. description: "Selector is an optional field that contains
  1433. a selector expression (see Policy for sample syntax).
  1434. \ Only traffic that originates from (terminates at) endpoints
  1435. matching the selector will be matched. \n Note that: in
  1436. addition to the negated version of the Selector (see NotSelector
  1437. below), the selector expression syntax itself supports
  1438. negation. The two types of negation are subtly different.
  1439. One negates the set of matched endpoints, the other negates
  1440. the whole match: \n \tSelector = \"!has(my_label)\" matches
  1441. packets that are from other Calico-controlled \tendpoints
  1442. that do not have the label \"my_label\". \n \tNotSelector
  1443. = \"has(my_label)\" matches packets that are not from
  1444. Calico-controlled \tendpoints that do have the label \"my_label\".
  1445. \n The effect is that the latter will accept packets from
  1446. non-Calico sources whereas the former is limited to packets
  1447. from Calico-controlled endpoints."
  1448. type: string
  1449. serviceAccounts:
  1450. description: ServiceAccounts is an optional field that restricts
  1451. the rule to only apply to traffic that originates from
  1452. (or terminates at) a pod running as a matching service
  1453. account.
  1454. properties:
  1455. names:
  1456. description: Names is an optional field that restricts
  1457. the rule to only apply to traffic that originates
  1458. from (or terminates at) a pod running as a service
  1459. account whose name is in the list.
  1460. items:
  1461. type: string
  1462. type: array
  1463. selector:
  1464. description: Selector is an optional field that restricts
  1465. the rule to only apply to traffic that originates
  1466. from (or terminates at) a pod running as a service
  1467. account that matches the given label selector. If
  1468. both Names and Selector are specified then they are
  1469. AND'ed.
  1470. type: string
  1471. type: object
  1472. type: object
  1473. http:
  1474. description: HTTP contains match criteria that apply to HTTP
  1475. requests.
  1476. properties:
  1477. methods:
  1478. description: Methods is an optional field that restricts
  1479. the rule to apply only to HTTP requests that use one of
  1480. the listed HTTP Methods (e.g. GET, PUT, etc.) Multiple
  1481. methods are OR'd together.
  1482. items:
  1483. type: string
  1484. type: array
  1485. paths:
  1486. description: 'Paths is an optional field that restricts
  1487. the rule to apply to HTTP requests that use one of the
  1488. listed HTTP Paths. Multiple paths are OR''d together.
  1489. e.g: - exact: /foo - prefix: /bar NOTE: Each entry may
  1490. ONLY specify either a `exact` or a `prefix` match. The
  1491. validator will check for it.'
  1492. items:
  1493. description: 'HTTPPath specifies an HTTP path to match.
  1494. It may be either of the form: exact: <path>: which matches
  1495. the path exactly or prefix: <path-prefix>: which matches
  1496. the path prefix'
  1497. properties:
  1498. exact:
  1499. type: string
  1500. prefix:
  1501. type: string
  1502. type: object
  1503. type: array
  1504. type: object
  1505. icmp:
  1506. description: ICMP is an optional field that restricts the rule
  1507. to apply to a specific type and code of ICMP traffic. This
  1508. should only be specified if the Protocol field is set to "ICMP"
  1509. or "ICMPv6".
  1510. properties:
  1511. code:
  1512. description: Match on a specific ICMP code. If specified,
  1513. the Type value must also be specified. This is a technical
  1514. limitation imposed by the kernel's iptables firewall,
  1515. which Calico uses to enforce the rule.
  1516. type: integer
  1517. type:
  1518. description: Match on a specific ICMP type. For example
  1519. a value of 8 refers to ICMP Echo Request (i.e. pings).
  1520. type: integer
  1521. type: object
  1522. ipVersion:
  1523. description: IPVersion is an optional field that restricts the
  1524. rule to only match a specific IP version.
  1525. type: integer
  1526. metadata:
  1527. description: Metadata contains additional information for this
  1528. rule
  1529. properties:
  1530. annotations:
  1531. additionalProperties:
  1532. type: string
  1533. description: Annotations is a set of key value pairs that
  1534. give extra information about the rule
  1535. type: object
  1536. type: object
  1537. notICMP:
  1538. description: NotICMP is the negated version of the ICMP field.
  1539. properties:
  1540. code:
  1541. description: Match on a specific ICMP code. If specified,
  1542. the Type value must also be specified. This is a technical
  1543. limitation imposed by the kernel's iptables firewall,
  1544. which Calico uses to enforce the rule.
  1545. type: integer
  1546. type:
  1547. description: Match on a specific ICMP type. For example
  1548. a value of 8 refers to ICMP Echo Request (i.e. pings).
  1549. type: integer
  1550. type: object
  1551. notProtocol:
  1552. anyOf:
  1553. - type: integer
  1554. - type: string
  1555. description: NotProtocol is the negated version of the Protocol
  1556. field.
  1557. pattern: ^.*
  1558. x-kubernetes-int-or-string: true
  1559. protocol:
  1560. anyOf:
  1561. - type: integer
  1562. - type: string
  1563. description: "Protocol is an optional field that restricts the
  1564. rule to only apply to traffic of a specific IP protocol. Required
  1565. if any of the EntityRules contain Ports (because ports only
  1566. apply to certain protocols). \n Must be one of these string
  1567. values: \"TCP\", \"UDP\", \"ICMP\", \"ICMPv6\", \"SCTP\",
  1568. \"UDPLite\" or an integer in the range 1-255."
  1569. pattern: ^.*
  1570. x-kubernetes-int-or-string: true
  1571. source:
  1572. description: Source contains the match criteria that apply to
  1573. source entity.
  1574. properties:
  1575. namespaceSelector:
  1576. description: "NamespaceSelector is an optional field that
  1577. contains a selector expression. Only traffic that originates
  1578. from (or terminates at) endpoints within the selected
  1579. namespaces will be matched. When both NamespaceSelector
  1580. and Selector are defined on the same rule, then only workload
  1581. endpoints that are matched by both selectors will be selected
  1582. by the rule. \n For NetworkPolicy, an empty NamespaceSelector
  1583. implies that the Selector is limited to selecting only
  1584. workload endpoints in the same namespace as the NetworkPolicy.
  1585. \n For NetworkPolicy, `global()` NamespaceSelector implies
  1586. that the Selector is limited to selecting only GlobalNetworkSet
  1587. or HostEndpoint. \n For GlobalNetworkPolicy, an empty
  1588. NamespaceSelector implies the Selector applies to workload
  1589. endpoints across all namespaces."
  1590. type: string
  1591. nets:
  1592. description: Nets is an optional field that restricts the
  1593. rule to only apply to traffic that originates from (or
  1594. terminates at) IP addresses in any of the given subnets.
  1595. items:
  1596. type: string
  1597. type: array
  1598. notNets:
  1599. description: NotNets is the negated version of the Nets
  1600. field.
  1601. items:
  1602. type: string
  1603. type: array
  1604. notPorts:
  1605. description: NotPorts is the negated version of the Ports
  1606. field. Since only some protocols have ports, if any ports
  1607. are specified it requires the Protocol match in the Rule
  1608. to be set to "TCP" or "UDP".
  1609. items:
  1610. anyOf:
  1611. - type: integer
  1612. - type: string
  1613. pattern: ^.*
  1614. x-kubernetes-int-or-string: true
  1615. type: array
  1616. notSelector:
  1617. description: NotSelector is the negated version of the Selector
  1618. field. See Selector field for subtleties with negated
  1619. selectors.
  1620. type: string
  1621. ports:
  1622. description: "Ports is an optional field that restricts
  1623. the rule to only apply to traffic that has a source (destination)
  1624. port that matches one of these ranges/values. This value
  1625. is a list of integers or strings that represent ranges
  1626. of ports. \n Since only some protocols have ports, if
  1627. any ports are specified it requires the Protocol match
  1628. in the Rule to be set to \"TCP\" or \"UDP\"."
  1629. items:
  1630. anyOf:
  1631. - type: integer
  1632. - type: string
  1633. pattern: ^.*
  1634. x-kubernetes-int-or-string: true
  1635. type: array
  1636. selector:
  1637. description: "Selector is an optional field that contains
  1638. a selector expression (see Policy for sample syntax).
  1639. \ Only traffic that originates from (terminates at) endpoints
  1640. matching the selector will be matched. \n Note that: in
  1641. addition to the negated version of the Selector (see NotSelector
  1642. below), the selector expression syntax itself supports
  1643. negation. The two types of negation are subtly different.
  1644. One negates the set of matched endpoints, the other negates
  1645. the whole match: \n \tSelector = \"!has(my_label)\" matches
  1646. packets that are from other Calico-controlled \tendpoints
  1647. that do not have the label \"my_label\". \n \tNotSelector
  1648. = \"has(my_label)\" matches packets that are not from
  1649. Calico-controlled \tendpoints that do have the label \"my_label\".
  1650. \n The effect is that the latter will accept packets from
  1651. non-Calico sources whereas the former is limited to packets
  1652. from Calico-controlled endpoints."
  1653. type: string
  1654. serviceAccounts:
  1655. description: ServiceAccounts is an optional field that restricts
  1656. the rule to only apply to traffic that originates from
  1657. (or terminates at) a pod running as a matching service
  1658. account.
  1659. properties:
  1660. names:
  1661. description: Names is an optional field that restricts
  1662. the rule to only apply to traffic that originates
  1663. from (or terminates at) a pod running as a service
  1664. account whose name is in the list.
  1665. items:
  1666. type: string
  1667. type: array
  1668. selector:
  1669. description: Selector is an optional field that restricts
  1670. the rule to only apply to traffic that originates
  1671. from (or terminates at) a pod running as a service
  1672. account that matches the given label selector. If
  1673. both Names and Selector are specified then they are
  1674. AND'ed.
  1675. type: string
  1676. type: object
  1677. type: object
  1678. required:
  1679. - action
  1680. type: object
  1681. type: array
  1682. namespaceSelector:
  1683. description: NamespaceSelector is an optional field for an expression
  1684. used to select a pod based on namespaces.
  1685. type: string
  1686. order:
  1687. description: Order is an optional field that specifies the order in
  1688. which the policy is applied. Policies with higher "order" are applied
  1689. after those with lower order. If the order is omitted, it may be
  1690. considered to be "infinite" - i.e. the policy will be applied last. Policies
  1691. with identical order will be applied in alphanumerical order based
  1692. on the Policy "Name".
  1693. type: number
  1694. preDNAT:
  1695. description: PreDNAT indicates to apply the rules in this policy before
  1696. any DNAT.
  1697. type: boolean
  1698. selector:
  1699. description: "The selector is an expression used to pick pick out
  1700. the endpoints that the policy should be applied to. \n Selector
  1701. expressions follow this syntax: \n \tlabel == \"string_literal\"
  1702. \ -> comparison, e.g. my_label == \"foo bar\" \tlabel != \"string_literal\"
  1703. \ -> not equal; also matches if label is not present \tlabel in
  1704. { \"a\", \"b\", \"c\", ... } -> true if the value of label X is
  1705. one of \"a\", \"b\", \"c\" \tlabel not in { \"a\", \"b\", \"c\",
  1706. ... } -> true if the value of label X is not one of \"a\", \"b\",
  1707. \"c\" \thas(label_name) -> True if that label is present \t! expr
  1708. -> negation of expr \texpr && expr -> Short-circuit and \texpr
  1709. || expr -> Short-circuit or \t( expr ) -> parens for grouping \tall()
  1710. or the empty selector -> matches all endpoints. \n Label names are
  1711. allowed to contain alphanumerics, -, _ and /. String literals are
  1712. more permissive but they do not support escape characters. \n Examples
  1713. (with made-up labels): \n \ttype == \"webserver\" && deployment
  1714. == \"prod\" \ttype in {\"frontend\", \"backend\"} \tdeployment !=
  1715. \"dev\" \t! has(label_name)"
  1716. type: string
  1717. serviceAccountSelector:
  1718. description: ServiceAccountSelector is an optional field for an expression
  1719. used to select a pod based on service accounts.
  1720. type: string
  1721. types:
  1722. description: "Types indicates whether this policy applies to ingress,
  1723. or to egress, or to both. When not explicitly specified (and so
  1724. the value on creation is empty or nil), Calico defaults Types according
  1725. to what Ingress and Egress rules are present in the policy. The
  1726. default is: \n - [ PolicyTypeIngress ], if there are no Egress rules
  1727. (including the case where there are also no Ingress rules) \n
  1728. - [ PolicyTypeEgress ], if there are Egress rules but no Ingress
  1729. rules \n - [ PolicyTypeIngress, PolicyTypeEgress ], if there are
  1730. both Ingress and Egress rules. \n When the policy is read back again,
  1731. Types will always be one of these values, never empty or nil."
  1732. items:
  1733. description: PolicyType enumerates the possible values of the PolicySpec
  1734. Types field.
  1735. type: string
  1736. type: array
  1737. type: object
  1738. type: object
  1739. served: true
  1740. storage: true
  1741. status:
  1742. acceptedNames:
  1743. kind: ""
  1744. plural: ""
  1745. conditions: []
  1746. storedVersions: []
  1747. ---
  1748. apiVersion: apiextensions.k8s.io/v1
  1749. kind: CustomResourceDefinition
  1750. metadata:
  1751. name: globalnetworksets.crd.projectcalico.org
  1752. spec:
  1753. group: crd.projectcalico.org
  1754. names:
  1755. kind: GlobalNetworkSet
  1756. listKind: GlobalNetworkSetList
  1757. plural: globalnetworksets
  1758. singular: globalnetworkset
  1759. scope: Cluster
  1760. versions:
  1761. - name: v1
  1762. schema:
  1763. openAPIV3Schema:
  1764. description: GlobalNetworkSet contains a set of arbitrary IP sub-networks/CIDRs
  1765. that share labels to allow rules to refer to them via selectors. The labels
  1766. of GlobalNetworkSet are not namespaced.
  1767. properties:
  1768. apiVersion:
  1769. description: 'APIVersion defines the versioned schema of this representation
  1770. of an object. Servers should convert recognized schemas to the latest
  1771. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  1772. type: string
  1773. kind:
  1774. description: 'Kind is a string value representing the REST resource this
  1775. object represents. Servers may infer this from the endpoint the client
  1776. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  1777. type: string
  1778. metadata:
  1779. type: object
  1780. spec:
  1781. description: GlobalNetworkSetSpec contains the specification for a NetworkSet
  1782. resource.
  1783. properties:
  1784. nets:
  1785. description: The list of IP networks that belong to this set.
  1786. items:
  1787. type: string
  1788. type: array
  1789. type: object
  1790. type: object
  1791. served: true
  1792. storage: true
  1793. status:
  1794. acceptedNames:
  1795. kind: ""
  1796. plural: ""
  1797. conditions: []
  1798. storedVersions: []
  1799. ---
  1800. apiVersion: apiextensions.k8s.io/v1
  1801. kind: CustomResourceDefinition
  1802. metadata:
  1803. name: hostendpoints.crd.projectcalico.org
  1804. spec:
  1805. group: crd.projectcalico.org
  1806. names:
  1807. kind: HostEndpoint
  1808. listKind: HostEndpointList
  1809. plural: hostendpoints
  1810. singular: hostendpoint
  1811. scope: Cluster
  1812. versions:
  1813. - name: v1
  1814. schema:
  1815. openAPIV3Schema:
  1816. properties:
  1817. apiVersion:
  1818. description: 'APIVersion defines the versioned schema of this representation
  1819. of an object. Servers should convert recognized schemas to the latest
  1820. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  1821. type: string
  1822. kind:
  1823. description: 'Kind is a string value representing the REST resource this
  1824. object represents. Servers may infer this from the endpoint the client
  1825. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  1826. type: string
  1827. metadata:
  1828. type: object
  1829. spec:
  1830. description: HostEndpointSpec contains the specification for a HostEndpoint
  1831. resource.
  1832. properties:
  1833. expectedIPs:
  1834. description: "The expected IP addresses (IPv4 and IPv6) of the endpoint.
  1835. If \"InterfaceName\" is not present, Calico will look for an interface
  1836. matching any of the IPs in the list and apply policy to that. Note:
  1837. \tWhen using the selector match criteria in an ingress or egress
  1838. security Policy \tor Profile, Calico converts the selector into
  1839. a set of IP addresses. For host \tendpoints, the ExpectedIPs field
  1840. is used for that purpose. (If only the interface \tname is specified,
  1841. Calico does not learn the IPs of the interface for use in match
  1842. \tcriteria.)"
  1843. items:
  1844. type: string
  1845. type: array
  1846. interfaceName:
  1847. description: "Either \"*\", or the name of a specific Linux interface
  1848. to apply policy to; or empty. \"*\" indicates that this HostEndpoint
  1849. governs all traffic to, from or through the default network namespace
  1850. of the host named by the \"Node\" field; entering and leaving that
  1851. namespace via any interface, including those from/to non-host-networked
  1852. local workloads. \n If InterfaceName is not \"*\", this HostEndpoint
  1853. only governs traffic that enters or leaves the host through the
  1854. specific interface named by InterfaceName, or - when InterfaceName
  1855. is empty - through the specific interface that has one of the IPs
  1856. in ExpectedIPs. Therefore, when InterfaceName is empty, at least
  1857. one expected IP must be specified. Only external interfaces (such
  1858. as \"eth0\") are supported here; it isn't possible for a HostEndpoint
  1859. to protect traffic through a specific local workload interface.
  1860. \n Note: Only some kinds of policy are implemented for \"*\" HostEndpoints;
  1861. initially just pre-DNAT policy. Please check Calico documentation
  1862. for the latest position."
  1863. type: string
  1864. node:
  1865. description: The node name identifying the Calico node instance.
  1866. type: string
  1867. ports:
  1868. description: Ports contains the endpoint's named ports, which may
  1869. be referenced in security policy rules.
  1870. items:
  1871. properties:
  1872. name:
  1873. type: string
  1874. port:
  1875. type: integer
  1876. protocol:
  1877. anyOf:
  1878. - type: integer
  1879. - type: string
  1880. pattern: ^.*
  1881. x-kubernetes-int-or-string: true
  1882. required:
  1883. - name
  1884. - port
  1885. - protocol
  1886. type: object
  1887. type: array
  1888. profiles:
  1889. description: A list of identifiers of security Profile objects that
  1890. apply to this endpoint. Each profile is applied in the order that
  1891. they appear in this list. Profile rules are applied after the selector-based
  1892. security policy.
  1893. items:
  1894. type: string
  1895. type: array
  1896. type: object
  1897. type: object
  1898. served: true
  1899. storage: true
  1900. status:
  1901. acceptedNames:
  1902. kind: ""
  1903. plural: ""
  1904. conditions: []
  1905. storedVersions: []
  1906. ---
  1907. apiVersion: apiextensions.k8s.io/v1
  1908. kind: CustomResourceDefinition
  1909. metadata:
  1910. name: ipamblocks.crd.projectcalico.org
  1911. spec:
  1912. group: crd.projectcalico.org
  1913. names:
  1914. kind: IPAMBlock
  1915. listKind: IPAMBlockList
  1916. plural: ipamblocks
  1917. singular: ipamblock
  1918. scope: Cluster
  1919. versions:
  1920. - name: v1
  1921. schema:
  1922. openAPIV3Schema:
  1923. properties:
  1924. apiVersion:
  1925. description: 'APIVersion defines the versioned schema of this representation
  1926. of an object. Servers should convert recognized schemas to the latest
  1927. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  1928. type: string
  1929. kind:
  1930. description: 'Kind is a string value representing the REST resource this
  1931. object represents. Servers may infer this from the endpoint the client
  1932. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  1933. type: string
  1934. metadata:
  1935. type: object
  1936. spec:
  1937. description: IPAMBlockSpec contains the specification for an IPAMBlock
  1938. resource.
  1939. properties:
  1940. affinity:
  1941. type: string
  1942. allocations:
  1943. items:
  1944. type: integer
  1945. # TODO: This nullable is manually added in. We should update controller-gen
  1946. # to handle []*int properly itself.
  1947. nullable: true
  1948. type: array
  1949. attributes:
  1950. items:
  1951. properties:
  1952. handle_id:
  1953. type: string
  1954. secondary:
  1955. additionalProperties:
  1956. type: string
  1957. type: object
  1958. type: object
  1959. type: array
  1960. cidr:
  1961. type: string
  1962. deleted:
  1963. type: boolean
  1964. strictAffinity:
  1965. type: boolean
  1966. unallocated:
  1967. items:
  1968. type: integer
  1969. type: array
  1970. required:
  1971. - allocations
  1972. - attributes
  1973. - cidr
  1974. - strictAffinity
  1975. - unallocated
  1976. type: object
  1977. type: object
  1978. served: true
  1979. storage: true
  1980. status:
  1981. acceptedNames:
  1982. kind: ""
  1983. plural: ""
  1984. conditions: []
  1985. storedVersions: []
  1986. ---
  1987. apiVersion: apiextensions.k8s.io/v1
  1988. kind: CustomResourceDefinition
  1989. metadata:
  1990. name: ipamconfigs.crd.projectcalico.org
  1991. spec:
  1992. group: crd.projectcalico.org
  1993. names:
  1994. kind: IPAMConfig
  1995. listKind: IPAMConfigList
  1996. plural: ipamconfigs
  1997. singular: ipamconfig
  1998. scope: Cluster
  1999. versions:
  2000. - name: v1
  2001. schema:
  2002. openAPIV3Schema:
  2003. properties:
  2004. apiVersion:
  2005. description: 'APIVersion defines the versioned schema of this representation
  2006. of an object. Servers should convert recognized schemas to the latest
  2007. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  2008. type: string
  2009. kind:
  2010. description: 'Kind is a string value representing the REST resource this
  2011. object represents. Servers may infer this from the endpoint the client
  2012. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  2013. type: string
  2014. metadata:
  2015. type: object
  2016. spec:
  2017. description: IPAMConfigSpec contains the specification for an IPAMConfig
  2018. resource.
  2019. properties:
  2020. autoAllocateBlocks:
  2021. type: boolean
  2022. maxBlocksPerHost:
  2023. description: MaxBlocksPerHost, if non-zero, is the max number of blocks
  2024. that can be affine to each host.
  2025. type: integer
  2026. strictAffinity:
  2027. type: boolean
  2028. required:
  2029. - autoAllocateBlocks
  2030. - strictAffinity
  2031. type: object
  2032. type: object
  2033. served: true
  2034. storage: true
  2035. status:
  2036. acceptedNames:
  2037. kind: ""
  2038. plural: ""
  2039. conditions: []
  2040. storedVersions: []
  2041. ---
  2042. apiVersion: apiextensions.k8s.io/v1
  2043. kind: CustomResourceDefinition
  2044. metadata:
  2045. name: ipamhandles.crd.projectcalico.org
  2046. spec:
  2047. group: crd.projectcalico.org
  2048. names:
  2049. kind: IPAMHandle
  2050. listKind: IPAMHandleList
  2051. plural: ipamhandles
  2052. singular: ipamhandle
  2053. scope: Cluster
  2054. versions:
  2055. - name: v1
  2056. schema:
  2057. openAPIV3Schema:
  2058. properties:
  2059. apiVersion:
  2060. description: 'APIVersion defines the versioned schema of this representation
  2061. of an object. Servers should convert recognized schemas to the latest
  2062. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  2063. type: string
  2064. kind:
  2065. description: 'Kind is a string value representing the REST resource this
  2066. object represents. Servers may infer this from the endpoint the client
  2067. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  2068. type: string
  2069. metadata:
  2070. type: object
  2071. spec:
  2072. description: IPAMHandleSpec contains the specification for an IPAMHandle
  2073. resource.
  2074. properties:
  2075. block:
  2076. additionalProperties:
  2077. type: integer
  2078. type: object
  2079. deleted:
  2080. type: boolean
  2081. handleID:
  2082. type: string
  2083. required:
  2084. - block
  2085. - handleID
  2086. type: object
  2087. type: object
  2088. served: true
  2089. storage: true
  2090. status:
  2091. acceptedNames:
  2092. kind: ""
  2093. plural: ""
  2094. conditions: []
  2095. storedVersions: []
  2096. ---
  2097. apiVersion: apiextensions.k8s.io/v1
  2098. kind: CustomResourceDefinition
  2099. metadata:
  2100. name: ippools.crd.projectcalico.org
  2101. spec:
  2102. group: crd.projectcalico.org
  2103. names:
  2104. kind: IPPool
  2105. listKind: IPPoolList
  2106. plural: ippools
  2107. singular: ippool
  2108. scope: Cluster
  2109. versions:
  2110. - name: v1
  2111. schema:
  2112. openAPIV3Schema:
  2113. properties:
  2114. apiVersion:
  2115. description: 'APIVersion defines the versioned schema of this representation
  2116. of an object. Servers should convert recognized schemas to the latest
  2117. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  2118. type: string
  2119. kind:
  2120. description: 'Kind is a string value representing the REST resource this
  2121. object represents. Servers may infer this from the endpoint the client
  2122. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  2123. type: string
  2124. metadata:
  2125. type: object
  2126. spec:
  2127. description: IPPoolSpec contains the specification for an IPPool resource.
  2128. properties:
  2129. blockSize:
  2130. description: The block size to use for IP address assignments from
  2131. this pool. Defaults to 26 for IPv4 and 112 for IPv6.
  2132. type: integer
  2133. cidr:
  2134. description: The pool CIDR.
  2135. type: string
  2136. disabled:
  2137. description: When disabled is true, Calico IPAM will not assign addresses
  2138. from this pool.
  2139. type: boolean
  2140. ipip:
  2141. description: 'Deprecated: this field is only used for APIv1 backwards
  2142. compatibility. Setting this field is not allowed, this field is
  2143. for internal use only.'
  2144. properties:
  2145. enabled:
  2146. description: When enabled is true, ipip tunneling will be used
  2147. to deliver packets to destinations within this pool.
  2148. type: boolean
  2149. mode:
  2150. description: The IPIP mode. This can be one of "always" or "cross-subnet". A
  2151. mode of "always" will also use IPIP tunneling for routing to
  2152. destination IP addresses within this pool. A mode of "cross-subnet"
  2153. will only use IPIP tunneling when the destination node is on
  2154. a different subnet to the originating node. The default value
  2155. (if not specified) is "always".
  2156. type: string
  2157. type: object
  2158. ipipMode:
  2159. description: Contains configuration for IPIP tunneling for this pool.
  2160. If not specified, then this is defaulted to "Never" (i.e. IPIP tunneling
  2161. is disabled).
  2162. type: string
  2163. nat-outgoing:
  2164. description: 'Deprecated: this field is only used for APIv1 backwards
  2165. compatibility. Setting this field is not allowed, this field is
  2166. for internal use only.'
  2167. type: boolean
  2168. natOutgoing:
  2169. description: When nat-outgoing is true, packets sent from Calico networked
  2170. containers in this pool to destinations outside of this pool will
  2171. be masqueraded.
  2172. type: boolean
  2173. nodeSelector:
  2174. description: Allows IPPool to allocate for a specific node by label
  2175. selector.
  2176. type: string
  2177. vxlanMode:
  2178. description: Contains configuration for VXLAN tunneling for this pool.
  2179. If not specified, then this is defaulted to "Never" (i.e. VXLAN
  2180. tunneling is disabled).
  2181. type: string
  2182. required:
  2183. - cidr
  2184. type: object
  2185. type: object
  2186. served: true
  2187. storage: true
  2188. status:
  2189. acceptedNames:
  2190. kind: ""
  2191. plural: ""
  2192. conditions: []
  2193. storedVersions: []
  2194. ---
  2195. apiVersion: apiextensions.k8s.io/v1
  2196. kind: CustomResourceDefinition
  2197. metadata:
  2198. name: kubecontrollersconfigurations.crd.projectcalico.org
  2199. spec:
  2200. group: crd.projectcalico.org
  2201. names:
  2202. kind: KubeControllersConfiguration
  2203. listKind: KubeControllersConfigurationList
  2204. plural: kubecontrollersconfigurations
  2205. singular: kubecontrollersconfiguration
  2206. scope: Cluster
  2207. versions:
  2208. - name: v1
  2209. schema:
  2210. openAPIV3Schema:
  2211. properties:
  2212. apiVersion:
  2213. description: 'APIVersion defines the versioned schema of this representation
  2214. of an object. Servers should convert recognized schemas to the latest
  2215. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  2216. type: string
  2217. kind:
  2218. description: 'Kind is a string value representing the REST resource this
  2219. object represents. Servers may infer this from the endpoint the client
  2220. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  2221. type: string
  2222. metadata:
  2223. type: object
  2224. spec:
  2225. description: KubeControllersConfigurationSpec contains the values of the
  2226. Kubernetes controllers configuration.
  2227. properties:
  2228. controllers:
  2229. description: Controllers enables and configures individual Kubernetes
  2230. controllers
  2231. properties:
  2232. namespace:
  2233. description: Namespace enables and configures the namespace controller.
  2234. Enabled by default, set to nil to disable.
  2235. properties:
  2236. reconcilerPeriod:
  2237. description: 'ReconcilerPeriod is the period to perform reconciliation
  2238. with the Calico datastore. [Default: 5m]'
  2239. type: string
  2240. type: object
  2241. node:
  2242. description: Node enables and configures the node controller.
  2243. Enabled by default, set to nil to disable.
  2244. properties:
  2245. hostEndpoint:
  2246. description: HostEndpoint controls syncing nodes to host endpoints.
  2247. Disabled by default, set to nil to disable.
  2248. properties:
  2249. autoCreate:
  2250. description: 'AutoCreate enables automatic creation of
  2251. host endpoints for every node. [Default: Disabled]'
  2252. type: string
  2253. type: object
  2254. reconcilerPeriod:
  2255. description: 'ReconcilerPeriod is the period to perform reconciliation
  2256. with the Calico datastore. [Default: 5m]'
  2257. type: string
  2258. syncLabels:
  2259. description: 'SyncLabels controls whether to copy Kubernetes
  2260. node labels to Calico nodes. [Default: Enabled]'
  2261. type: string
  2262. type: object
  2263. policy:
  2264. description: Policy enables and configures the policy controller.
  2265. Enabled by default, set to nil to disable.
  2266. properties:
  2267. reconcilerPeriod:
  2268. description: 'ReconcilerPeriod is the period to perform reconciliation
  2269. with the Calico datastore. [Default: 5m]'
  2270. type: string
  2271. type: object
  2272. serviceAccount:
  2273. description: ServiceAccount enables and configures the service
  2274. account controller. Enabled by default, set to nil to disable.
  2275. properties:
  2276. reconcilerPeriod:
  2277. description: 'ReconcilerPeriod is the period to perform reconciliation
  2278. with the Calico datastore. [Default: 5m]'
  2279. type: string
  2280. type: object
  2281. workloadEndpoint:
  2282. description: WorkloadEndpoint enables and configures the workload
  2283. endpoint controller. Enabled by default, set to nil to disable.
  2284. properties:
  2285. reconcilerPeriod:
  2286. description: 'ReconcilerPeriod is the period to perform reconciliation
  2287. with the Calico datastore. [Default: 5m]'
  2288. type: string
  2289. type: object
  2290. type: object
  2291. etcdV3CompactionPeriod:
  2292. description: 'EtcdV3CompactionPeriod is the period between etcdv3
  2293. compaction requests. Set to 0 to disable. [Default: 10m]'
  2294. type: string
  2295. healthChecks:
  2296. description: 'HealthChecks enables or disables support for health
  2297. checks [Default: Enabled]'
  2298. type: string
  2299. logSeverityScreen:
  2300. description: 'LogSeverityScreen is the log severity above which logs
  2301. are sent to the stdout. [Default: Info]'
  2302. type: string
  2303. prometheusMetricsPort:
  2304. description: 'PrometheusMetricsPort is the TCP port that the Prometheus
  2305. metrics server should bind to. Set to 0 to disable. [Default: 9094]'
  2306. type: integer
  2307. required:
  2308. - controllers
  2309. type: object
  2310. status:
  2311. description: KubeControllersConfigurationStatus represents the status
  2312. of the configuration. It's useful for admins to be able to see the actual
  2313. config that was applied, which can be modified by environment variables
  2314. on the kube-controllers process.
  2315. properties:
  2316. environmentVars:
  2317. additionalProperties:
  2318. type: string
  2319. description: EnvironmentVars contains the environment variables on
  2320. the kube-controllers that influenced the RunningConfig.
  2321. type: object
  2322. runningConfig:
  2323. description: RunningConfig contains the effective config that is running
  2324. in the kube-controllers pod, after merging the API resource with
  2325. any environment variables.
  2326. properties:
  2327. controllers:
  2328. description: Controllers enables and configures individual Kubernetes
  2329. controllers
  2330. properties:
  2331. namespace:
  2332. description: Namespace enables and configures the namespace
  2333. controller. Enabled by default, set to nil to disable.
  2334. properties:
  2335. reconcilerPeriod:
  2336. description: 'ReconcilerPeriod is the period to perform
  2337. reconciliation with the Calico datastore. [Default:
  2338. 5m]'
  2339. type: string
  2340. type: object
  2341. node:
  2342. description: Node enables and configures the node controller.
  2343. Enabled by default, set to nil to disable.
  2344. properties:
  2345. hostEndpoint:
  2346. description: HostEndpoint controls syncing nodes to host
  2347. endpoints. Disabled by default, set to nil to disable.
  2348. properties:
  2349. autoCreate:
  2350. description: 'AutoCreate enables automatic creation
  2351. of host endpoints for every node. [Default: Disabled]'
  2352. type: string
  2353. type: object
  2354. reconcilerPeriod:
  2355. description: 'ReconcilerPeriod is the period to perform
  2356. reconciliation with the Calico datastore. [Default:
  2357. 5m]'
  2358. type: string
  2359. syncLabels:
  2360. description: 'SyncLabels controls whether to copy Kubernetes
  2361. node labels to Calico nodes. [Default: Enabled]'
  2362. type: string
  2363. type: object
  2364. policy:
  2365. description: Policy enables and configures the policy controller.
  2366. Enabled by default, set to nil to disable.
  2367. properties:
  2368. reconcilerPeriod:
  2369. description: 'ReconcilerPeriod is the period to perform
  2370. reconciliation with the Calico datastore. [Default:
  2371. 5m]'
  2372. type: string
  2373. type: object
  2374. serviceAccount:
  2375. description: ServiceAccount enables and configures the service
  2376. account controller. Enabled by default, set to nil to disable.
  2377. properties:
  2378. reconcilerPeriod:
  2379. description: 'ReconcilerPeriod is the period to perform
  2380. reconciliation with the Calico datastore. [Default:
  2381. 5m]'
  2382. type: string
  2383. type: object
  2384. workloadEndpoint:
  2385. description: WorkloadEndpoint enables and configures the workload
  2386. endpoint controller. Enabled by default, set to nil to disable.
  2387. properties:
  2388. reconcilerPeriod:
  2389. description: 'ReconcilerPeriod is the period to perform
  2390. reconciliation with the Calico datastore. [Default:
  2391. 5m]'
  2392. type: string
  2393. type: object
  2394. type: object
  2395. etcdV3CompactionPeriod:
  2396. description: 'EtcdV3CompactionPeriod is the period between etcdv3
  2397. compaction requests. Set to 0 to disable. [Default: 10m]'
  2398. type: string
  2399. healthChecks:
  2400. description: 'HealthChecks enables or disables support for health
  2401. checks [Default: Enabled]'
  2402. type: string
  2403. logSeverityScreen:
  2404. description: 'LogSeverityScreen is the log severity above which
  2405. logs are sent to the stdout. [Default: Info]'
  2406. type: string
  2407. prometheusMetricsPort:
  2408. description: 'PrometheusMetricsPort is the TCP port that the Prometheus
  2409. metrics server should bind to. Set to 0 to disable. [Default:
  2410. 9094]'
  2411. type: integer
  2412. required:
  2413. - controllers
  2414. type: object
  2415. type: object
  2416. type: object
  2417. served: true
  2418. storage: true
  2419. status:
  2420. acceptedNames:
  2421. kind: ""
  2422. plural: ""
  2423. conditions: []
  2424. storedVersions: []
  2425. ---
  2426. apiVersion: apiextensions.k8s.io/v1
  2427. kind: CustomResourceDefinition
  2428. metadata:
  2429. name: networkpolicies.crd.projectcalico.org
  2430. spec:
  2431. group: crd.projectcalico.org
  2432. names:
  2433. kind: NetworkPolicy
  2434. listKind: NetworkPolicyList
  2435. plural: networkpolicies
  2436. singular: networkpolicy
  2437. scope: Namespaced
  2438. versions:
  2439. - name: v1
  2440. schema:
  2441. openAPIV3Schema:
  2442. properties:
  2443. apiVersion:
  2444. description: 'APIVersion defines the versioned schema of this representation
  2445. of an object. Servers should convert recognized schemas to the latest
  2446. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  2447. type: string
  2448. kind:
  2449. description: 'Kind is a string value representing the REST resource this
  2450. object represents. Servers may infer this from the endpoint the client
  2451. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  2452. type: string
  2453. metadata:
  2454. type: object
  2455. spec:
  2456. properties:
  2457. egress:
  2458. description: The ordered set of egress rules. Each rule contains
  2459. a set of packet match criteria and a corresponding action to apply.
  2460. items:
  2461. description: "A Rule encapsulates a set of match criteria and an
  2462. action. Both selector-based security Policy and security Profiles
  2463. reference rules - separated out as a list of rules for both ingress
  2464. and egress packet matching. \n Each positive match criteria has
  2465. a negated version, prefixed with \"Not\". All the match criteria
  2466. within a rule must be satisfied for a packet to match. A single
  2467. rule can contain the positive and negative version of a match
  2468. and both must be satisfied for the rule to match."
  2469. properties:
  2470. action:
  2471. type: string
  2472. destination:
  2473. description: Destination contains the match criteria that apply
  2474. to destination entity.
  2475. properties:
  2476. namespaceSelector:
  2477. description: "NamespaceSelector is an optional field that
  2478. contains a selector expression. Only traffic that originates
  2479. from (or terminates at) endpoints within the selected
  2480. namespaces will be matched. When both NamespaceSelector
  2481. and Selector are defined on the same rule, then only workload
  2482. endpoints that are matched by both selectors will be selected
  2483. by the rule. \n For NetworkPolicy, an empty NamespaceSelector
  2484. implies that the Selector is limited to selecting only
  2485. workload endpoints in the same namespace as the NetworkPolicy.
  2486. \n For NetworkPolicy, `global()` NamespaceSelector implies
  2487. that the Selector is limited to selecting only GlobalNetworkSet
  2488. or HostEndpoint. \n For GlobalNetworkPolicy, an empty
  2489. NamespaceSelector implies the Selector applies to workload
  2490. endpoints across all namespaces."
  2491. type: string
  2492. nets:
  2493. description: Nets is an optional field that restricts the
  2494. rule to only apply to traffic that originates from (or
  2495. terminates at) IP addresses in any of the given subnets.
  2496. items:
  2497. type: string
  2498. type: array
  2499. notNets:
  2500. description: NotNets is the negated version of the Nets
  2501. field.
  2502. items:
  2503. type: string
  2504. type: array
  2505. notPorts:
  2506. description: NotPorts is the negated version of the Ports
  2507. field. Since only some protocols have ports, if any ports
  2508. are specified it requires the Protocol match in the Rule
  2509. to be set to "TCP" or "UDP".
  2510. items:
  2511. anyOf:
  2512. - type: integer
  2513. - type: string
  2514. pattern: ^.*
  2515. x-kubernetes-int-or-string: true
  2516. type: array
  2517. notSelector:
  2518. description: NotSelector is the negated version of the Selector
  2519. field. See Selector field for subtleties with negated
  2520. selectors.
  2521. type: string
  2522. ports:
  2523. description: "Ports is an optional field that restricts
  2524. the rule to only apply to traffic that has a source (destination)
  2525. port that matches one of these ranges/values. This value
  2526. is a list of integers or strings that represent ranges
  2527. of ports. \n Since only some protocols have ports, if
  2528. any ports are specified it requires the Protocol match
  2529. in the Rule to be set to \"TCP\" or \"UDP\"."
  2530. items:
  2531. anyOf:
  2532. - type: integer
  2533. - type: string
  2534. pattern: ^.*
  2535. x-kubernetes-int-or-string: true
  2536. type: array
  2537. selector:
  2538. description: "Selector is an optional field that contains
  2539. a selector expression (see Policy for sample syntax).
  2540. \ Only traffic that originates from (terminates at) endpoints
  2541. matching the selector will be matched. \n Note that: in
  2542. addition to the negated version of the Selector (see NotSelector
  2543. below), the selector expression syntax itself supports
  2544. negation. The two types of negation are subtly different.
  2545. One negates the set of matched endpoints, the other negates
  2546. the whole match: \n \tSelector = \"!has(my_label)\" matches
  2547. packets that are from other Calico-controlled \tendpoints
  2548. that do not have the label \"my_label\". \n \tNotSelector
  2549. = \"has(my_label)\" matches packets that are not from
  2550. Calico-controlled \tendpoints that do have the label \"my_label\".
  2551. \n The effect is that the latter will accept packets from
  2552. non-Calico sources whereas the former is limited to packets
  2553. from Calico-controlled endpoints."
  2554. type: string
  2555. serviceAccounts:
  2556. description: ServiceAccounts is an optional field that restricts
  2557. the rule to only apply to traffic that originates from
  2558. (or terminates at) a pod running as a matching service
  2559. account.
  2560. properties:
  2561. names:
  2562. description: Names is an optional field that restricts
  2563. the rule to only apply to traffic that originates
  2564. from (or terminates at) a pod running as a service
  2565. account whose name is in the list.
  2566. items:
  2567. type: string
  2568. type: array
  2569. selector:
  2570. description: Selector is an optional field that restricts
  2571. the rule to only apply to traffic that originates
  2572. from (or terminates at) a pod running as a service
  2573. account that matches the given label selector. If
  2574. both Names and Selector are specified then they are
  2575. AND'ed.
  2576. type: string
  2577. type: object
  2578. type: object
  2579. http:
  2580. description: HTTP contains match criteria that apply to HTTP
  2581. requests.
  2582. properties:
  2583. methods:
  2584. description: Methods is an optional field that restricts
  2585. the rule to apply only to HTTP requests that use one of
  2586. the listed HTTP Methods (e.g. GET, PUT, etc.) Multiple
  2587. methods are OR'd together.
  2588. items:
  2589. type: string
  2590. type: array
  2591. paths:
  2592. description: 'Paths is an optional field that restricts
  2593. the rule to apply to HTTP requests that use one of the
  2594. listed HTTP Paths. Multiple paths are OR''d together.
  2595. e.g: - exact: /foo - prefix: /bar NOTE: Each entry may
  2596. ONLY specify either a `exact` or a `prefix` match. The
  2597. validator will check for it.'
  2598. items:
  2599. description: 'HTTPPath specifies an HTTP path to match.
  2600. It may be either of the form: exact: <path>: which matches
  2601. the path exactly or prefix: <path-prefix>: which matches
  2602. the path prefix'
  2603. properties:
  2604. exact:
  2605. type: string
  2606. prefix:
  2607. type: string
  2608. type: object
  2609. type: array
  2610. type: object
  2611. icmp:
  2612. description: ICMP is an optional field that restricts the rule
  2613. to apply to a specific type and code of ICMP traffic. This
  2614. should only be specified if the Protocol field is set to "ICMP"
  2615. or "ICMPv6".
  2616. properties:
  2617. code:
  2618. description: Match on a specific ICMP code. If specified,
  2619. the Type value must also be specified. This is a technical
  2620. limitation imposed by the kernel's iptables firewall,
  2621. which Calico uses to enforce the rule.
  2622. type: integer
  2623. type:
  2624. description: Match on a specific ICMP type. For example
  2625. a value of 8 refers to ICMP Echo Request (i.e. pings).
  2626. type: integer
  2627. type: object
  2628. ipVersion:
  2629. description: IPVersion is an optional field that restricts the
  2630. rule to only match a specific IP version.
  2631. type: integer
  2632. metadata:
  2633. description: Metadata contains additional information for this
  2634. rule
  2635. properties:
  2636. annotations:
  2637. additionalProperties:
  2638. type: string
  2639. description: Annotations is a set of key value pairs that
  2640. give extra information about the rule
  2641. type: object
  2642. type: object
  2643. notICMP:
  2644. description: NotICMP is the negated version of the ICMP field.
  2645. properties:
  2646. code:
  2647. description: Match on a specific ICMP code. If specified,
  2648. the Type value must also be specified. This is a technical
  2649. limitation imposed by the kernel's iptables firewall,
  2650. which Calico uses to enforce the rule.
  2651. type: integer
  2652. type:
  2653. description: Match on a specific ICMP type. For example
  2654. a value of 8 refers to ICMP Echo Request (i.e. pings).
  2655. type: integer
  2656. type: object
  2657. notProtocol:
  2658. anyOf:
  2659. - type: integer
  2660. - type: string
  2661. description: NotProtocol is the negated version of the Protocol
  2662. field.
  2663. pattern: ^.*
  2664. x-kubernetes-int-or-string: true
  2665. protocol:
  2666. anyOf:
  2667. - type: integer
  2668. - type: string
  2669. description: "Protocol is an optional field that restricts the
  2670. rule to only apply to traffic of a specific IP protocol. Required
  2671. if any of the EntityRules contain Ports (because ports only
  2672. apply to certain protocols). \n Must be one of these string
  2673. values: \"TCP\", \"UDP\", \"ICMP\", \"ICMPv6\", \"SCTP\",
  2674. \"UDPLite\" or an integer in the range 1-255."
  2675. pattern: ^.*
  2676. x-kubernetes-int-or-string: true
  2677. source:
  2678. description: Source contains the match criteria that apply to
  2679. source entity.
  2680. properties:
  2681. namespaceSelector:
  2682. description: "NamespaceSelector is an optional field that
  2683. contains a selector expression. Only traffic that originates
  2684. from (or terminates at) endpoints within the selected
  2685. namespaces will be matched. When both NamespaceSelector
  2686. and Selector are defined on the same rule, then only workload
  2687. endpoints that are matched by both selectors will be selected
  2688. by the rule. \n For NetworkPolicy, an empty NamespaceSelector
  2689. implies that the Selector is limited to selecting only
  2690. workload endpoints in the same namespace as the NetworkPolicy.
  2691. \n For NetworkPolicy, `global()` NamespaceSelector implies
  2692. that the Selector is limited to selecting only GlobalNetworkSet
  2693. or HostEndpoint. \n For GlobalNetworkPolicy, an empty
  2694. NamespaceSelector implies the Selector applies to workload
  2695. endpoints across all namespaces."
  2696. type: string
  2697. nets:
  2698. description: Nets is an optional field that restricts the
  2699. rule to only apply to traffic that originates from (or
  2700. terminates at) IP addresses in any of the given subnets.
  2701. items:
  2702. type: string
  2703. type: array
  2704. notNets:
  2705. description: NotNets is the negated version of the Nets
  2706. field.
  2707. items:
  2708. type: string
  2709. type: array
  2710. notPorts:
  2711. description: NotPorts is the negated version of the Ports
  2712. field. Since only some protocols have ports, if any ports
  2713. are specified it requires the Protocol match in the Rule
  2714. to be set to "TCP" or "UDP".
  2715. items:
  2716. anyOf:
  2717. - type: integer
  2718. - type: string
  2719. pattern: ^.*
  2720. x-kubernetes-int-or-string: true
  2721. type: array
  2722. notSelector:
  2723. description: NotSelector is the negated version of the Selector
  2724. field. See Selector field for subtleties with negated
  2725. selectors.
  2726. type: string
  2727. ports:
  2728. description: "Ports is an optional field that restricts
  2729. the rule to only apply to traffic that has a source (destination)
  2730. port that matches one of these ranges/values. This value
  2731. is a list of integers or strings that represent ranges
  2732. of ports. \n Since only some protocols have ports, if
  2733. any ports are specified it requires the Protocol match
  2734. in the Rule to be set to \"TCP\" or \"UDP\"."
  2735. items:
  2736. anyOf:
  2737. - type: integer
  2738. - type: string
  2739. pattern: ^.*
  2740. x-kubernetes-int-or-string: true
  2741. type: array
  2742. selector:
  2743. description: "Selector is an optional field that contains
  2744. a selector expression (see Policy for sample syntax).
  2745. \ Only traffic that originates from (terminates at) endpoints
  2746. matching the selector will be matched. \n Note that: in
  2747. addition to the negated version of the Selector (see NotSelector
  2748. below), the selector expression syntax itself supports
  2749. negation. The two types of negation are subtly different.
  2750. One negates the set of matched endpoints, the other negates
  2751. the whole match: \n \tSelector = \"!has(my_label)\" matches
  2752. packets that are from other Calico-controlled \tendpoints
  2753. that do not have the label \"my_label\". \n \tNotSelector
  2754. = \"has(my_label)\" matches packets that are not from
  2755. Calico-controlled \tendpoints that do have the label \"my_label\".
  2756. \n The effect is that the latter will accept packets from
  2757. non-Calico sources whereas the former is limited to packets
  2758. from Calico-controlled endpoints."
  2759. type: string
  2760. serviceAccounts:
  2761. description: ServiceAccounts is an optional field that restricts
  2762. the rule to only apply to traffic that originates from
  2763. (or terminates at) a pod running as a matching service
  2764. account.
  2765. properties:
  2766. names:
  2767. description: Names is an optional field that restricts
  2768. the rule to only apply to traffic that originates
  2769. from (or terminates at) a pod running as a service
  2770. account whose name is in the list.
  2771. items:
  2772. type: string
  2773. type: array
  2774. selector:
  2775. description: Selector is an optional field that restricts
  2776. the rule to only apply to traffic that originates
  2777. from (or terminates at) a pod running as a service
  2778. account that matches the given label selector. If
  2779. both Names and Selector are specified then they are
  2780. AND'ed.
  2781. type: string
  2782. type: object
  2783. type: object
  2784. required:
  2785. - action
  2786. type: object
  2787. type: array
  2788. ingress:
  2789. description: The ordered set of ingress rules. Each rule contains
  2790. a set of packet match criteria and a corresponding action to apply.
  2791. items:
  2792. description: "A Rule encapsulates a set of match criteria and an
  2793. action. Both selector-based security Policy and security Profiles
  2794. reference rules - separated out as a list of rules for both ingress
  2795. and egress packet matching. \n Each positive match criteria has
  2796. a negated version, prefixed with \"Not\". All the match criteria
  2797. within a rule must be satisfied for a packet to match. A single
  2798. rule can contain the positive and negative version of a match
  2799. and both must be satisfied for the rule to match."
  2800. properties:
  2801. action:
  2802. type: string
  2803. destination:
  2804. description: Destination contains the match criteria that apply
  2805. to destination entity.
  2806. properties:
  2807. namespaceSelector:
  2808. description: "NamespaceSelector is an optional field that
  2809. contains a selector expression. Only traffic that originates
  2810. from (or terminates at) endpoints within the selected
  2811. namespaces will be matched. When both NamespaceSelector
  2812. and Selector are defined on the same rule, then only workload
  2813. endpoints that are matched by both selectors will be selected
  2814. by the rule. \n For NetworkPolicy, an empty NamespaceSelector
  2815. implies that the Selector is limited to selecting only
  2816. workload endpoints in the same namespace as the NetworkPolicy.
  2817. \n For NetworkPolicy, `global()` NamespaceSelector implies
  2818. that the Selector is limited to selecting only GlobalNetworkSet
  2819. or HostEndpoint. \n For GlobalNetworkPolicy, an empty
  2820. NamespaceSelector implies the Selector applies to workload
  2821. endpoints across all namespaces."
  2822. type: string
  2823. nets:
  2824. description: Nets is an optional field that restricts the
  2825. rule to only apply to traffic that originates from (or
  2826. terminates at) IP addresses in any of the given subnets.
  2827. items:
  2828. type: string
  2829. type: array
  2830. notNets:
  2831. description: NotNets is the negated version of the Nets
  2832. field.
  2833. items:
  2834. type: string
  2835. type: array
  2836. notPorts:
  2837. description: NotPorts is the negated version of the Ports
  2838. field. Since only some protocols have ports, if any ports
  2839. are specified it requires the Protocol match in the Rule
  2840. to be set to "TCP" or "UDP".
  2841. items:
  2842. anyOf:
  2843. - type: integer
  2844. - type: string
  2845. pattern: ^.*
  2846. x-kubernetes-int-or-string: true
  2847. type: array
  2848. notSelector:
  2849. description: NotSelector is the negated version of the Selector
  2850. field. See Selector field for subtleties with negated
  2851. selectors.
  2852. type: string
  2853. ports:
  2854. description: "Ports is an optional field that restricts
  2855. the rule to only apply to traffic that has a source (destination)
  2856. port that matches one of these ranges/values. This value
  2857. is a list of integers or strings that represent ranges
  2858. of ports. \n Since only some protocols have ports, if
  2859. any ports are specified it requires the Protocol match
  2860. in the Rule to be set to \"TCP\" or \"UDP\"."
  2861. items:
  2862. anyOf:
  2863. - type: integer
  2864. - type: string
  2865. pattern: ^.*
  2866. x-kubernetes-int-or-string: true
  2867. type: array
  2868. selector:
  2869. description: "Selector is an optional field that contains
  2870. a selector expression (see Policy for sample syntax).
  2871. \ Only traffic that originates from (terminates at) endpoints
  2872. matching the selector will be matched. \n Note that: in
  2873. addition to the negated version of the Selector (see NotSelector
  2874. below), the selector expression syntax itself supports
  2875. negation. The two types of negation are subtly different.
  2876. One negates the set of matched endpoints, the other negates
  2877. the whole match: \n \tSelector = \"!has(my_label)\" matches
  2878. packets that are from other Calico-controlled \tendpoints
  2879. that do not have the label \"my_label\". \n \tNotSelector
  2880. = \"has(my_label)\" matches packets that are not from
  2881. Calico-controlled \tendpoints that do have the label \"my_label\".
  2882. \n The effect is that the latter will accept packets from
  2883. non-Calico sources whereas the former is limited to packets
  2884. from Calico-controlled endpoints."
  2885. type: string
  2886. serviceAccounts:
  2887. description: ServiceAccounts is an optional field that restricts
  2888. the rule to only apply to traffic that originates from
  2889. (or terminates at) a pod running as a matching service
  2890. account.
  2891. properties:
  2892. names:
  2893. description: Names is an optional field that restricts
  2894. the rule to only apply to traffic that originates
  2895. from (or terminates at) a pod running as a service
  2896. account whose name is in the list.
  2897. items:
  2898. type: string
  2899. type: array
  2900. selector:
  2901. description: Selector is an optional field that restricts
  2902. the rule to only apply to traffic that originates
  2903. from (or terminates at) a pod running as a service
  2904. account that matches the given label selector. If
  2905. both Names and Selector are specified then they are
  2906. AND'ed.
  2907. type: string
  2908. type: object
  2909. type: object
  2910. http:
  2911. description: HTTP contains match criteria that apply to HTTP
  2912. requests.
  2913. properties:
  2914. methods:
  2915. description: Methods is an optional field that restricts
  2916. the rule to apply only to HTTP requests that use one of
  2917. the listed HTTP Methods (e.g. GET, PUT, etc.) Multiple
  2918. methods are OR'd together.
  2919. items:
  2920. type: string
  2921. type: array
  2922. paths:
  2923. description: 'Paths is an optional field that restricts
  2924. the rule to apply to HTTP requests that use one of the
  2925. listed HTTP Paths. Multiple paths are OR''d together.
  2926. e.g: - exact: /foo - prefix: /bar NOTE: Each entry may
  2927. ONLY specify either a `exact` or a `prefix` match. The
  2928. validator will check for it.'
  2929. items:
  2930. description: 'HTTPPath specifies an HTTP path to match.
  2931. It may be either of the form: exact: <path>: which matches
  2932. the path exactly or prefix: <path-prefix>: which matches
  2933. the path prefix'
  2934. properties:
  2935. exact:
  2936. type: string
  2937. prefix:
  2938. type: string
  2939. type: object
  2940. type: array
  2941. type: object
  2942. icmp:
  2943. description: ICMP is an optional field that restricts the rule
  2944. to apply to a specific type and code of ICMP traffic. This
  2945. should only be specified if the Protocol field is set to "ICMP"
  2946. or "ICMPv6".
  2947. properties:
  2948. code:
  2949. description: Match on a specific ICMP code. If specified,
  2950. the Type value must also be specified. This is a technical
  2951. limitation imposed by the kernel's iptables firewall,
  2952. which Calico uses to enforce the rule.
  2953. type: integer
  2954. type:
  2955. description: Match on a specific ICMP type. For example
  2956. a value of 8 refers to ICMP Echo Request (i.e. pings).
  2957. type: integer
  2958. type: object
  2959. ipVersion:
  2960. description: IPVersion is an optional field that restricts the
  2961. rule to only match a specific IP version.
  2962. type: integer
  2963. metadata:
  2964. description: Metadata contains additional information for this
  2965. rule
  2966. properties:
  2967. annotations:
  2968. additionalProperties:
  2969. type: string
  2970. description: Annotations is a set of key value pairs that
  2971. give extra information about the rule
  2972. type: object
  2973. type: object
  2974. notICMP:
  2975. description: NotICMP is the negated version of the ICMP field.
  2976. properties:
  2977. code:
  2978. description: Match on a specific ICMP code. If specified,
  2979. the Type value must also be specified. This is a technical
  2980. limitation imposed by the kernel's iptables firewall,
  2981. which Calico uses to enforce the rule.
  2982. type: integer
  2983. type:
  2984. description: Match on a specific ICMP type. For example
  2985. a value of 8 refers to ICMP Echo Request (i.e. pings).
  2986. type: integer
  2987. type: object
  2988. notProtocol:
  2989. anyOf:
  2990. - type: integer
  2991. - type: string
  2992. description: NotProtocol is the negated version of the Protocol
  2993. field.
  2994. pattern: ^.*
  2995. x-kubernetes-int-or-string: true
  2996. protocol:
  2997. anyOf:
  2998. - type: integer
  2999. - type: string
  3000. description: "Protocol is an optional field that restricts the
  3001. rule to only apply to traffic of a specific IP protocol. Required
  3002. if any of the EntityRules contain Ports (because ports only
  3003. apply to certain protocols). \n Must be one of these string
  3004. values: \"TCP\", \"UDP\", \"ICMP\", \"ICMPv6\", \"SCTP\",
  3005. \"UDPLite\" or an integer in the range 1-255."
  3006. pattern: ^.*
  3007. x-kubernetes-int-or-string: true
  3008. source:
  3009. description: Source contains the match criteria that apply to
  3010. source entity.
  3011. properties:
  3012. namespaceSelector:
  3013. description: "NamespaceSelector is an optional field that
  3014. contains a selector expression. Only traffic that originates
  3015. from (or terminates at) endpoints within the selected
  3016. namespaces will be matched. When both NamespaceSelector
  3017. and Selector are defined on the same rule, then only workload
  3018. endpoints that are matched by both selectors will be selected
  3019. by the rule. \n For NetworkPolicy, an empty NamespaceSelector
  3020. implies that the Selector is limited to selecting only
  3021. workload endpoints in the same namespace as the NetworkPolicy.
  3022. \n For NetworkPolicy, `global()` NamespaceSelector implies
  3023. that the Selector is limited to selecting only GlobalNetworkSet
  3024. or HostEndpoint. \n For GlobalNetworkPolicy, an empty
  3025. NamespaceSelector implies the Selector applies to workload
  3026. endpoints across all namespaces."
  3027. type: string
  3028. nets:
  3029. description: Nets is an optional field that restricts the
  3030. rule to only apply to traffic that originates from (or
  3031. terminates at) IP addresses in any of the given subnets.
  3032. items:
  3033. type: string
  3034. type: array
  3035. notNets:
  3036. description: NotNets is the negated version of the Nets
  3037. field.
  3038. items:
  3039. type: string
  3040. type: array
  3041. notPorts:
  3042. description: NotPorts is the negated version of the Ports
  3043. field. Since only some protocols have ports, if any ports
  3044. are specified it requires the Protocol match in the Rule
  3045. to be set to "TCP" or "UDP".
  3046. items:
  3047. anyOf:
  3048. - type: integer
  3049. - type: string
  3050. pattern: ^.*
  3051. x-kubernetes-int-or-string: true
  3052. type: array
  3053. notSelector:
  3054. description: NotSelector is the negated version of the Selector
  3055. field. See Selector field for subtleties with negated
  3056. selectors.
  3057. type: string
  3058. ports:
  3059. description: "Ports is an optional field that restricts
  3060. the rule to only apply to traffic that has a source (destination)
  3061. port that matches one of these ranges/values. This value
  3062. is a list of integers or strings that represent ranges
  3063. of ports. \n Since only some protocols have ports, if
  3064. any ports are specified it requires the Protocol match
  3065. in the Rule to be set to \"TCP\" or \"UDP\"."
  3066. items:
  3067. anyOf:
  3068. - type: integer
  3069. - type: string
  3070. pattern: ^.*
  3071. x-kubernetes-int-or-string: true
  3072. type: array
  3073. selector:
  3074. description: "Selector is an optional field that contains
  3075. a selector expression (see Policy for sample syntax).
  3076. \ Only traffic that originates from (terminates at) endpoints
  3077. matching the selector will be matched. \n Note that: in
  3078. addition to the negated version of the Selector (see NotSelector
  3079. below), the selector expression syntax itself supports
  3080. negation. The two types of negation are subtly different.
  3081. One negates the set of matched endpoints, the other negates
  3082. the whole match: \n \tSelector = \"!has(my_label)\" matches
  3083. packets that are from other Calico-controlled \tendpoints
  3084. that do not have the label \"my_label\". \n \tNotSelector
  3085. = \"has(my_label)\" matches packets that are not from
  3086. Calico-controlled \tendpoints that do have the label \"my_label\".
  3087. \n The effect is that the latter will accept packets from
  3088. non-Calico sources whereas the former is limited to packets
  3089. from Calico-controlled endpoints."
  3090. type: string
  3091. serviceAccounts:
  3092. description: ServiceAccounts is an optional field that restricts
  3093. the rule to only apply to traffic that originates from
  3094. (or terminates at) a pod running as a matching service
  3095. account.
  3096. properties:
  3097. names:
  3098. description: Names is an optional field that restricts
  3099. the rule to only apply to traffic that originates
  3100. from (or terminates at) a pod running as a service
  3101. account whose name is in the list.
  3102. items:
  3103. type: string
  3104. type: array
  3105. selector:
  3106. description: Selector is an optional field that restricts
  3107. the rule to only apply to traffic that originates
  3108. from (or terminates at) a pod running as a service
  3109. account that matches the given label selector. If
  3110. both Names and Selector are specified then they are
  3111. AND'ed.
  3112. type: string
  3113. type: object
  3114. type: object
  3115. required:
  3116. - action
  3117. type: object
  3118. type: array
  3119. order:
  3120. description: Order is an optional field that specifies the order in
  3121. which the policy is applied. Policies with higher "order" are applied
  3122. after those with lower order. If the order is omitted, it may be
  3123. considered to be "infinite" - i.e. the policy will be applied last. Policies
  3124. with identical order will be applied in alphanumerical order based
  3125. on the Policy "Name".
  3126. type: number
  3127. selector:
  3128. description: "The selector is an expression used to pick pick out
  3129. the endpoints that the policy should be applied to. \n Selector
  3130. expressions follow this syntax: \n \tlabel == \"string_literal\"
  3131. \ -> comparison, e.g. my_label == \"foo bar\" \tlabel != \"string_literal\"
  3132. \ -> not equal; also matches if label is not present \tlabel in
  3133. { \"a\", \"b\", \"c\", ... } -> true if the value of label X is
  3134. one of \"a\", \"b\", \"c\" \tlabel not in { \"a\", \"b\", \"c\",
  3135. ... } -> true if the value of label X is not one of \"a\", \"b\",
  3136. \"c\" \thas(label_name) -> True if that label is present \t! expr
  3137. -> negation of expr \texpr && expr -> Short-circuit and \texpr
  3138. || expr -> Short-circuit or \t( expr ) -> parens for grouping \tall()
  3139. or the empty selector -> matches all endpoints. \n Label names are
  3140. allowed to contain alphanumerics, -, _ and /. String literals are
  3141. more permissive but they do not support escape characters. \n Examples
  3142. (with made-up labels): \n \ttype == \"webserver\" && deployment
  3143. == \"prod\" \ttype in {\"frontend\", \"backend\"} \tdeployment !=
  3144. \"dev\" \t! has(label_name)"
  3145. type: string
  3146. serviceAccountSelector:
  3147. description: ServiceAccountSelector is an optional field for an expression
  3148. used to select a pod based on service accounts.
  3149. type: string
  3150. types:
  3151. description: "Types indicates whether this policy applies to ingress,
  3152. or to egress, or to both. When not explicitly specified (and so
  3153. the value on creation is empty or nil), Calico defaults Types according
  3154. to what Ingress and Egress are present in the policy. The default
  3155. is: \n - [ PolicyTypeIngress ], if there are no Egress rules (including
  3156. the case where there are also no Ingress rules) \n - [ PolicyTypeEgress
  3157. ], if there are Egress rules but no Ingress rules \n - [ PolicyTypeIngress,
  3158. PolicyTypeEgress ], if there are both Ingress and Egress rules.
  3159. \n When the policy is read back again, Types will always be one
  3160. of these values, never empty or nil."
  3161. items:
  3162. description: PolicyType enumerates the possible values of the PolicySpec
  3163. Types field.
  3164. type: string
  3165. type: array
  3166. type: object
  3167. type: object
  3168. served: true
  3169. storage: true
  3170. status:
  3171. acceptedNames:
  3172. kind: ""
  3173. plural: ""
  3174. conditions: []
  3175. storedVersions: []
  3176. ---
  3177. apiVersion: apiextensions.k8s.io/v1
  3178. kind: CustomResourceDefinition
  3179. metadata:
  3180. name: networksets.crd.projectcalico.org
  3181. spec:
  3182. group: crd.projectcalico.org
  3183. names:
  3184. kind: NetworkSet
  3185. listKind: NetworkSetList
  3186. plural: networksets
  3187. singular: networkset
  3188. scope: Namespaced
  3189. versions:
  3190. - name: v1
  3191. schema:
  3192. openAPIV3Schema:
  3193. description: NetworkSet is the Namespaced-equivalent of the GlobalNetworkSet.
  3194. properties:
  3195. apiVersion:
  3196. description: 'APIVersion defines the versioned schema of this representation
  3197. of an object. Servers should convert recognized schemas to the latest
  3198. internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
  3199. type: string
  3200. kind:
  3201. description: 'Kind is a string value representing the REST resource this
  3202. object represents. Servers may infer this from the endpoint the client
  3203. submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
  3204. type: string
  3205. metadata:
  3206. type: object
  3207. spec:
  3208. description: NetworkSetSpec contains the specification for a NetworkSet
  3209. resource.
  3210. properties:
  3211. nets:
  3212. description: The list of IP networks that belong to this set.
  3213. items:
  3214. type: string
  3215. type: array
  3216. type: object
  3217. type: object
  3218. served: true
  3219. storage: true
  3220. status:
  3221. acceptedNames:
  3222. kind: ""
  3223. plural: ""
  3224. conditions: []
  3225. storedVersions: []
  3226. ---
  3227. ---
  3228. # Source: calico/templates/calico-kube-controllers-rbac.yaml
  3229. # Include a clusterrole for the kube-controllers component,
  3230. # and bind it to the calico-kube-controllers serviceaccount.
  3231. kind: ClusterRole
  3232. apiVersion: rbac.authorization.k8s.io/v1
  3233. metadata:
  3234. name: calico-kube-controllers
  3235. rules:
  3236. # Nodes are watched to monitor for deletions.
  3237. - apiGroups: [""]
  3238. resources:
  3239. - nodes
  3240. verbs:
  3241. - watch
  3242. - list
  3243. - get
  3244. # Pods are queried to check for existence.
  3245. - apiGroups: [""]
  3246. resources:
  3247. - pods
  3248. verbs:
  3249. - get
  3250. # IPAM resources are manipulated when nodes are deleted.
  3251. - apiGroups: ["crd.projectcalico.org"]
  3252. resources:
  3253. - ippools
  3254. verbs:
  3255. - list
  3256. - apiGroups: ["crd.projectcalico.org"]
  3257. resources:
  3258. - blockaffinities
  3259. - ipamblocks
  3260. - ipamhandles
  3261. verbs:
  3262. - get
  3263. - list
  3264. - create
  3265. - update
  3266. - delete
  3267. - watch
  3268. # kube-controllers manages hostendpoints.
  3269. - apiGroups: ["crd.projectcalico.org"]
  3270. resources:
  3271. - hostendpoints
  3272. verbs:
  3273. - get
  3274. - list
  3275. - create
  3276. - update
  3277. - delete
  3278. # Needs access to update clusterinformations.
  3279. - apiGroups: ["crd.projectcalico.org"]
  3280. resources:
  3281. - clusterinformations
  3282. verbs:
  3283. - get
  3284. - create
  3285. - update
  3286. # KubeControllersConfiguration is where it gets its config
  3287. - apiGroups: ["crd.projectcalico.org"]
  3288. resources:
  3289. - kubecontrollersconfigurations
  3290. verbs:
  3291. # read its own config
  3292. - get
  3293. # create a default if none exists
  3294. - create
  3295. # update status
  3296. - update
  3297. # watch for changes
  3298. - watch
  3299. ---
  3300. kind: ClusterRoleBinding
  3301. apiVersion: rbac.authorization.k8s.io/v1
  3302. metadata:
  3303. name: calico-kube-controllers
  3304. roleRef:
  3305. apiGroup: rbac.authorization.k8s.io
  3306. kind: ClusterRole
  3307. name: calico-kube-controllers
  3308. subjects:
  3309. - kind: ServiceAccount
  3310. name: calico-kube-controllers
  3311. namespace: kube-system
  3312. ---
  3313. ---
  3314. # Source: calico/templates/calico-node-rbac.yaml
  3315. # Include a clusterrole for the calico-node DaemonSet,
  3316. # and bind it to the calico-node serviceaccount.
  3317. kind: ClusterRole
  3318. apiVersion: rbac.authorization.k8s.io/v1
  3319. metadata:
  3320. name: calico-node
  3321. rules:
  3322. # The CNI plugin needs to get pods, nodes, and namespaces.
  3323. - apiGroups: [""]
  3324. resources:
  3325. - pods
  3326. - nodes
  3327. - namespaces
  3328. verbs:
  3329. - get
  3330. - apiGroups: [""]
  3331. resources:
  3332. - endpoints
  3333. - services
  3334. verbs:
  3335. # Used to discover service IPs for advertisement.
  3336. - watch
  3337. - list
  3338. # Used to discover Typhas.
  3339. - get
  3340. # Pod CIDR auto-detection on kubeadm needs access to config maps.
  3341. - apiGroups: [""]
  3342. resources:
  3343. - configmaps
  3344. verbs:
  3345. - get
  3346. - apiGroups: [""]
  3347. resources:
  3348. - nodes/status
  3349. verbs:
  3350. # Needed for clearing NodeNetworkUnavailable flag.
  3351. - patch
  3352. # Calico stores some configuration information in node annotations.
  3353. - update
  3354. # Watch for changes to Kubernetes NetworkPolicies.
  3355. - apiGroups: ["networking.k8s.io"]
  3356. resources:
  3357. - networkpolicies
  3358. verbs:
  3359. - watch
  3360. - list
  3361. # Used by Calico for policy information.
  3362. - apiGroups: [""]
  3363. resources:
  3364. - pods
  3365. - namespaces
  3366. - serviceaccounts
  3367. verbs:
  3368. - list
  3369. - watch
  3370. # The CNI plugin patches pods/status.
  3371. - apiGroups: [""]
  3372. resources:
  3373. - pods/status
  3374. verbs:
  3375. - patch
  3376. # Calico monitors various CRDs for config.
  3377. - apiGroups: ["crd.projectcalico.org"]
  3378. resources:
  3379. - globalfelixconfigs
  3380. - felixconfigurations
  3381. - bgppeers
  3382. - globalbgpconfigs
  3383. - bgpconfigurations
  3384. - ippools
  3385. - ipamblocks
  3386. - globalnetworkpolicies
  3387. - globalnetworksets
  3388. - networkpolicies
  3389. - networksets
  3390. - clusterinformations
  3391. - hostendpoints
  3392. - blockaffinities
  3393. verbs:
  3394. - get
  3395. - list
  3396. - watch
  3397. # Calico must create and update some CRDs on startup.
  3398. - apiGroups: ["crd.projectcalico.org"]
  3399. resources:
  3400. - ippools
  3401. - felixconfigurations
  3402. - clusterinformations
  3403. verbs:
  3404. - create
  3405. - update
  3406. # Calico stores some configuration information on the node.
  3407. - apiGroups: [""]
  3408. resources:
  3409. - nodes
  3410. verbs:
  3411. - get
  3412. - list
  3413. - watch
  3414. # These permissions are only required for upgrade from v2.6, and can
  3415. # be removed after upgrade or on fresh installations.
  3416. - apiGroups: ["crd.projectcalico.org"]
  3417. resources:
  3418. - bgpconfigurations
  3419. - bgppeers
  3420. verbs:
  3421. - create
  3422. - update
  3423. # These permissions are required for Calico CNI to perform IPAM allocations.
  3424. - apiGroups: ["crd.projectcalico.org"]
  3425. resources:
  3426. - blockaffinities
  3427. - ipamblocks
  3428. - ipamhandles
  3429. verbs:
  3430. - get
  3431. - list
  3432. - create
  3433. - update
  3434. - delete
  3435. - apiGroups: ["crd.projectcalico.org"]
  3436. resources:
  3437. - ipamconfigs
  3438. verbs:
  3439. - get
  3440. # Block affinities must also be watchable by confd for route aggregation.
  3441. - apiGroups: ["crd.projectcalico.org"]
  3442. resources:
  3443. - blockaffinities
  3444. verbs:
  3445. - watch
  3446. # The Calico IPAM migration needs to get daemonsets. These permissions can be
  3447. # removed if not upgrading from an installation using host-local IPAM.
  3448. - apiGroups: ["apps"]
  3449. resources:
  3450. - daemonsets
  3451. verbs:
  3452. - get
  3453. ---
  3454. apiVersion: rbac.authorization.k8s.io/v1
  3455. kind: ClusterRoleBinding
  3456. metadata:
  3457. name: calico-node
  3458. roleRef:
  3459. apiGroup: rbac.authorization.k8s.io
  3460. kind: ClusterRole
  3461. name: calico-node
  3462. subjects:
  3463. - kind: ServiceAccount
  3464. name: calico-node
  3465. namespace: kube-system
  3466. ---
  3467. # Source: calico/templates/calico-node.yaml
  3468. # This manifest installs the calico-node container, as well
  3469. # as the CNI plugins and network config on
  3470. # each master and worker node in a Kubernetes cluster.
  3471. kind: DaemonSet
  3472. apiVersion: apps/v1
  3473. metadata:
  3474. name: calico-node
  3475. namespace: kube-system
  3476. labels:
  3477. k8s-app: calico-node
  3478. spec:
  3479. selector:
  3480. matchLabels:
  3481. k8s-app: calico-node
  3482. updateStrategy:
  3483. type: RollingUpdate
  3484. rollingUpdate:
  3485. maxUnavailable: 1
  3486. template:
  3487. metadata:
  3488. labels:
  3489. k8s-app: calico-node
  3490. spec:
  3491. nodeSelector:
  3492. kubernetes.io/os: linux
  3493. hostNetwork: true
  3494. tolerations:
  3495. # Make sure calico-node gets scheduled on all nodes.
  3496. - effect: NoSchedule
  3497. operator: Exists
  3498. # Mark the pod as a critical add-on for rescheduling.
  3499. - key: CriticalAddonsOnly
  3500. operator: Exists
  3501. - effect: NoExecute
  3502. operator: Exists
  3503. serviceAccountName: calico-node
  3504. # Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
  3505. # deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
  3506. terminationGracePeriodSeconds: 0
  3507. priorityClassName: system-node-critical
  3508. initContainers:
  3509. # This container performs upgrade from host-local IPAM to calico-ipam.
  3510. # It can be deleted if this is a fresh installation, or if you have already
  3511. # upgraded to use calico-ipam.
  3512. - name: upgrade-ipam
  3513. image: docker.io/calico/cni:v3.18.0
  3514. command: ["/opt/cni/bin/calico-ipam", "-upgrade"]
  3515. envFrom:
  3516. - configMapRef:
  3517. # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
  3518. name: kubernetes-services-endpoint
  3519. optional: true
  3520. env:
  3521. - name: KUBERNETES_NODE_NAME
  3522. valueFrom:
  3523. fieldRef:
  3524. fieldPath: spec.nodeName
  3525. - name: CALICO_NETWORKING_BACKEND
  3526. valueFrom:
  3527. configMapKeyRef:
  3528. name: calico-config
  3529. key: calico_backend
  3530. volumeMounts:
  3531. - mountPath: /var/lib/cni/networks
  3532. name: host-local-net-dir
  3533. - mountPath: /host/opt/cni/bin
  3534. name: cni-bin-dir
  3535. securityContext:
  3536. privileged: true
  3537. # This container installs the CNI binaries
  3538. # and CNI network config file on each node.
  3539. - name: install-cni
  3540. image: docker.io/calico/cni:v3.18.0
  3541. command: ["/opt/cni/bin/install"]
  3542. envFrom:
  3543. - configMapRef:
  3544. # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
  3545. name: kubernetes-services-endpoint
  3546. optional: true
  3547. env:
  3548. # Name of the CNI config file to create.
  3549. - name: CNI_CONF_NAME
  3550. value: "10-calico.conflist"
  3551. # The CNI network config to install on each node.
  3552. - name: CNI_NETWORK_CONFIG
  3553. valueFrom:
  3554. configMapKeyRef:
  3555. name: calico-config
  3556. key: cni_network_config
  3557. # Set the hostname based on the k8s node name.
  3558. - name: KUBERNETES_NODE_NAME
  3559. valueFrom:
  3560. fieldRef:
  3561. fieldPath: spec.nodeName
  3562. # CNI MTU Config variable
  3563. - name: CNI_MTU
  3564. valueFrom:
  3565. configMapKeyRef:
  3566. name: calico-config
  3567. key: veth_mtu
  3568. # Prevents the container from sleeping forever.
  3569. - name: SLEEP
  3570. value: "false"
  3571. volumeMounts:
  3572. - mountPath: /host/opt/cni/bin
  3573. name: cni-bin-dir
  3574. - mountPath: /host/etc/cni/net.d
  3575. name: cni-net-dir
  3576. securityContext:
  3577. privileged: true
  3578. # Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes
  3579. # to communicate with Felix over the Policy Sync API.
  3580. - name: flexvol-driver
  3581. image: docker.io/calico/pod2daemon-flexvol:v3.18.0
  3582. volumeMounts:
  3583. - name: flexvol-driver-host
  3584. mountPath: /host/driver
  3585. securityContext:
  3586. privileged: true
  3587. containers:
  3588. # Runs calico-node container on each Kubernetes node. This
  3589. # container programs network policy and routes on each
  3590. # host.
  3591. - name: calico-node
  3592. image: docker.io/calico/node:v3.18.0
  3593. envFrom:
  3594. - configMapRef:
  3595. # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
  3596. name: kubernetes-services-endpoint
  3597. optional: true
  3598. env:
  3599. # Use Kubernetes API as the backing datastore.
  3600. - name: DATASTORE_TYPE
  3601. value: "kubernetes"
  3602. # Wait for the datastore.
  3603. - name: WAIT_FOR_DATASTORE
  3604. value: "true"
  3605. # Set based on the k8s node name.
  3606. - name: NODENAME
  3607. valueFrom:
  3608. fieldRef:
  3609. fieldPath: spec.nodeName
  3610. # Choose the backend to use.
  3611. - name: CALICO_NETWORKING_BACKEND
  3612. valueFrom:
  3613. configMapKeyRef:
  3614. name: calico-config
  3615. key: calico_backend
  3616. # Cluster type to identify the deployment type
  3617. - name: CLUSTER_TYPE
  3618. value: "k8s,bgp"
  3619. # Auto-detect the BGP IP address.
  3620. - name: IP
  3621. value: "autodetect"
  3622. # Enable IPIP
  3623. - name: CALICO_IPV4POOL_IPIP
  3624. value: "Always"
  3625. # Enable or Disable VXLAN on the default IP pool.
  3626. - name: CALICO_IPV4POOL_VXLAN
  3627. value: "Never"
  3628. # Set MTU for tunnel device used if ipip is enabled
  3629. - name: FELIX_IPINIPMTU
  3630. valueFrom:
  3631. configMapKeyRef:
  3632. name: calico-config
  3633. key: veth_mtu
  3634. # Set MTU for the VXLAN tunnel device.
  3635. - name: FELIX_VXLANMTU
  3636. valueFrom:
  3637. configMapKeyRef:
  3638. name: calico-config
  3639. key: veth_mtu
  3640. # Set MTU for the Wireguard tunnel device.
  3641. - name: FELIX_WIREGUARDMTU
  3642. valueFrom:
  3643. configMapKeyRef:
  3644. name: calico-config
  3645. key: veth_mtu
  3646. # The default IPv4 pool to create on startup if none exists. Pod IPs will be
  3647. # chosen from this range. Changing this value after installation will have
  3648. # no effect. This should fall within `--cluster-cidr`.
  3649. # - name: CALICO_IPV4POOL_CIDR
  3650. # value: "192.168.0.0/16"
  3651. # Disable file logging so `kubectl logs` works.
  3652. - name: CALICO_DISABLE_FILE_LOGGING
  3653. value: "true"
  3654. # Set Felix endpoint to host default action to ACCEPT.
  3655. - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
  3656. value: "ACCEPT"
  3657. # Disable IPv6 on Kubernetes.
  3658. - name: FELIX_IPV6SUPPORT
  3659. value: "false"
  3660. # Set Felix logging to "info"
  3661. - name: FELIX_LOGSEVERITYSCREEN
  3662. value: "info"
  3663. - name: FELIX_HEALTHENABLED
  3664. value: "true"
  3665. securityContext:
  3666. privileged: true
  3667. resources:
  3668. requests:
  3669. cpu: 250m
  3670. livenessProbe:
  3671. exec:
  3672. command:
  3673. - /bin/calico-node
  3674. - -felix-live
  3675. - -bird-live
  3676. periodSeconds: 10
  3677. initialDelaySeconds: 10
  3678. failureThreshold: 6
  3679. readinessProbe:
  3680. exec:
  3681. command:
  3682. - /bin/calico-node
  3683. - -felix-ready
  3684. - -bird-ready
  3685. periodSeconds: 10
  3686. volumeMounts:
  3687. - mountPath: /lib/modules
  3688. name: lib-modules
  3689. readOnly: true
  3690. - mountPath: /run/xtables.lock
  3691. name: xtables-lock
  3692. readOnly: false
  3693. - mountPath: /var/run/calico
  3694. name: var-run-calico
  3695. readOnly: false
  3696. - mountPath: /var/lib/calico
  3697. name: var-lib-calico
  3698. readOnly: false
  3699. - name: policysync
  3700. mountPath: /var/run/nodeagent
  3701. # For eBPF mode, we need to be able to mount the BPF filesystem at /sys/fs/bpf so we mount in the
  3702. # parent directory.
  3703. - name: sysfs
  3704. mountPath: /sys/fs/
  3705. # Bidirectional means that, if we mount the BPF filesystem at /sys/fs/bpf it will propagate to the host.
  3706. # If the host is known to mount that filesystem already then Bidirectional can be omitted.
  3707. mountPropagation: Bidirectional
  3708. - name: cni-log-dir
  3709. mountPath: /var/log/calico/cni
  3710. readOnly: true
  3711. volumes:
  3712. # Used by calico-node.
  3713. - name: lib-modules
  3714. hostPath:
  3715. path: /lib/modules
  3716. - name: var-run-calico
  3717. hostPath:
  3718. path: /var/run/calico
  3719. - name: var-lib-calico
  3720. hostPath:
  3721. path: /var/lib/calico
  3722. - name: xtables-lock
  3723. hostPath:
  3724. path: /run/xtables.lock
  3725. type: FileOrCreate
  3726. - name: sysfs
  3727. hostPath:
  3728. path: /sys/fs/
  3729. type: DirectoryOrCreate
  3730. # Used to install CNI.
  3731. - name: cni-bin-dir
  3732. hostPath:
  3733. path: /opt/cni/bin
  3734. - name: cni-net-dir
  3735. hostPath:
  3736. path: /etc/cni/net.d
  3737. # Used to access CNI logs.
  3738. - name: cni-log-dir
  3739. hostPath:
  3740. path: /var/log/calico/cni
  3741. # Mount in the directory for host-local IPAM allocations. This is
  3742. # used when upgrading from host-local to calico-ipam, and can be removed
  3743. # if not using the upgrade-ipam init container.
  3744. - name: host-local-net-dir
  3745. hostPath:
  3746. path: /var/lib/cni/networks
  3747. # Used to create per-pod Unix Domain Sockets
  3748. - name: policysync
  3749. hostPath:
  3750. type: DirectoryOrCreate
  3751. path: /var/run/nodeagent
  3752. # Used to install Flex Volume Driver
  3753. - name: flexvol-driver-host
  3754. hostPath:
  3755. type: DirectoryOrCreate
  3756. path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
  3757. ---
  3758. apiVersion: v1
  3759. kind: ServiceAccount
  3760. metadata:
  3761. name: calico-node
  3762. namespace: kube-system
  3763. ---
  3764. # Source: calico/templates/calico-kube-controllers.yaml
  3765. # See https://github.com/projectcalico/kube-controllers
  3766. apiVersion: apps/v1
  3767. kind: Deployment
  3768. metadata:
  3769. name: calico-kube-controllers
  3770. namespace: kube-system
  3771. labels:
  3772. k8s-app: calico-kube-controllers
  3773. spec:
  3774. # The controllers can only have a single active instance.
  3775. replicas: 1
  3776. selector:
  3777. matchLabels:
  3778. k8s-app: calico-kube-controllers
  3779. strategy:
  3780. type: Recreate
  3781. template:
  3782. metadata:
  3783. name: calico-kube-controllers
  3784. namespace: kube-system
  3785. labels:
  3786. k8s-app: calico-kube-controllers
  3787. spec:
  3788. nodeSelector:
  3789. kubernetes.io/os: linux
  3790. tolerations:
  3791. # Mark the pod as a critical add-on for rescheduling.
  3792. - key: CriticalAddonsOnly
  3793. operator: Exists
  3794. - key: node-role.kubernetes.io/master
  3795. effect: NoSchedule
  3796. serviceAccountName: calico-kube-controllers
  3797. priorityClassName: system-cluster-critical
  3798. containers:
  3799. - name: calico-kube-controllers
  3800. image: docker.io/calico/kube-controllers:v3.18.0
  3801. env:
  3802. # Choose which controllers to run.
  3803. - name: ENABLED_CONTROLLERS
  3804. value: node
  3805. - name: DATASTORE_TYPE
  3806. value: kubernetes
  3807. readinessProbe:
  3808. exec:
  3809. command:
  3810. - /usr/bin/check-status
  3811. - -r
  3812. ---
  3813. apiVersion: v1
  3814. kind: ServiceAccount
  3815. metadata:
  3816. name: calico-kube-controllers
  3817. namespace: kube-system
  3818. ---
  3819. # This manifest creates a Pod Disruption Budget for Controller to allow K8s Cluster Autoscaler to evict
  3820. apiVersion: policy/v1
  3821. kind: PodDisruptionBudget
  3822. metadata:
  3823. name: calico-kube-controllers
  3824. namespace: kube-system
  3825. labels:
  3826. k8s-app: calico-kube-controllers
  3827. spec:
  3828. maxUnavailable: 1
  3829. selector:
  3830. matchLabels:
  3831. k8s-app: calico-kube-controllers
  3832. ---
  3833. # Source: calico/templates/calico-etcd-secrets.yaml
  3834. ---
  3835. # Source: calico/templates/calico-typha.yaml
  3836. ---
  3837. # Source: calico/templates/configure-canal.yaml

安装组件并验证

kubectl apply -f calico.yaml

kubectl get pods -n kube-system

# kube-system所有pod都running之后在执行
kubectl get nodes

六、使用 kubeadm 搭建生产环境的单 master 节点 k8s 集群

6.1 初始化集群

步骤参考5.2。同多master一样


6.2 使用kubeadm初始化集群

初始化master

kubeadm init --kubernetes-version=1.23.1 \
--apiserver-advertise-address=192.168.1.63 \
--image-repository registry.aliyuncs.com/google_containers \
--pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=SystemVerification

mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

子节点执行

kubeadm join 192.168.1.63:6443 --token nkylcf.lhirwxot0msp2pt1 \
    --discovery-token-ca-cert-hash sha256:6a7999f245768f13c63979931db9205185d1ee2c1d7339c2472c8ef8b63ca460

主节点验证

kubectl get nodes

6.3 calico组件安装

6.3.1 calico介绍

Calico 是一种容器之间互通的网络方案。是一个基于 BGP 的纯三层的网络方案,与 OpenStack、Kubernetes、AWS、GCE 等云平台都能够良好地集成。在虚拟化平台中,比如 OpenStack、Docker等都需要实现跨主机实现互连,但同时也需要对容器做隔离控制。而在多数的虚拟化平台实现中,通常都 使用二层隔离技术来实现容器的网络,这些二层的技术有一些弊端,比如需要依赖 VLAN、bridge 和 隧道等技术,其中 bridge 带来了复杂性,vlan 隔离和 tunnel 隧道在拆包或加包头时,则消耗更多的资源并对物理环境也有要求。随着网络规模的增大,整体会变得越加复杂。
Calico 保证所有容器之间的数据流量都是通过 IP 路由的方式完成互联互通的。Calico 节点组网时可以直接利用数据中心的网络结构(L2 或者 L3),不需要额外的 NAT、隧道或者 Overlay Network,没有额外的封包解包,能够节约 CPU 运算,提高网络效率。
Calico 在小规模集群中可以直接互联,在大规模集群中可以通过额外的 BGP route reflector 来完成。Calico 基于 iptables 还提供了丰富的网络策略,实现了 Kubernetes 的 Network Policy 策略,提供容器间网络可达性限制的功能。
Calico 把 Host 当作 Internet 中的路由器,使用 BGP 同步路由,并使用 iptables 来做安全访问策略。
设计思想:Calico 不使用隧道或 NAT 来实现转发,而是巧妙的把所有二三层流量转换成三层流量,并通过 host 上路由配置完成跨 Host 转发。
设计优势:
1.更优的资源利用
二层网络使用 VLAN 隔离技术,最多有 4096 个规格限制,即便可以使用 vxlan 解决,但vxlan 又带来了隧道开销的新问题。而 Calico 不使用 vlan 或 vxlan 技术,使资源利用率更高。
2.可扩展性
Calico 使用与 Internet 类似的方案,Internet 的网络比任何数据中心都大,Calico 同样天然具有可扩展性。
3.简单而更容易 debug
因为没有隧道,意味着 workloads 之间路径更短更简单,配置更少,在 host 上更容易进行debug 调试。
4.更少的依赖
Calico 仅依赖三层路由可达。
5.可适配性
Calico 较少的依赖性使它能适配所有 VM、Container 或者混合环境场景。


6.3.2 calico结构图

image.png
Calico 网络模型主要工作组件:
1.Felix:运行在每一台 Host 的 agent 进程,主要负责网络接口管理和监听、路由、ARP 管理、ACL 管理和同步、状态上报等。保证跨主机容器网络互通。
2.etcd:分布式键值存储,相当于 k8s 集群中的数据库,存储着 Calico 网络模型中 IP 地址等相关信息。
3.BGP Client(BIRD):Calico 为每一台 Host 部署一个 BGP Client,即每台 host 上部署一个BIRD。BIRD 是一个单独的持续发展的项目,实现了众多动态路由协议比如 BGP、OSPF、RIP 等。在Calico 的角色是监听 Host 上由 Felix 注入的路由信息,然后通过 BGP 协议广播告诉剩余 Host 节点,从而实现网络互通。
4.BGP Route Reflector:在大型网络规模中,如果仅仅使用 BGP client 形成 mesh 全网互联的方案就会导致规模限制,因为所有节点之间俩俩互联,需要 N^2 个连接,为了解决这个规模问题,可以采用 BGP 的 Router Reflector 的方法,使所有 BGP Client 仅与特定 RR 节点互联并做路由同步,从而大大减少连接数。


6.3.3 常见的网络插件对比分析

flannel:支持地址分配,不支持网络策略 。
calico:支持地址分配,支持网络策略。

flannel:
vxlan:#扩展的虚拟局域网
V 虚拟的
X 扩展的
lan 局域网
flannel 支持多种后端:

VxLAN:
(1) vxlan 叠加网络模式
(2) Directrouting
host-gw: Host Gateway
#直接路由模式,不推荐,只能在二层网络中,不支持跨网络,如果有成千上万的 Pod,容易产生广
播风暴
UDP:一般不用这个模式,性能差

flannel 方案: 需要在每个节点上把发向容器的数据包进行封装后,再用隧道将封装后的数据包发 送到运行着目标 Pod 的 node 节点上。目标 node 节点再负责去掉封装,将去除封装的数据包发送到目标 Pod 上。数据通信性能则大受影响
calico 方案:在 k8s 多个网路解决方案中选择了延迟表现最好的-calico 方案


6.3.4 对比总结

Flannel 网络非常类似于 Docker 网络的 Overlay 驱动,基于二层的层叠网络。
层叠网络的优势:
1.对底层网络依赖较少,不管底层是物理网络还是虚拟网络,对层叠网络的配置管理影响较少;
2.配置简单,逻辑清晰,易于理解和学习,非常适用于开发测试等对网络性能要求不高的场景。
层叠网络的劣势:
1.网络封装是一种传输开销,对网络性能会有影响,不适用于对网络性能要求高的生产场景;
2.由于对底层网络结构缺乏了解,无法做到真正有效的流量工程控制,也会对网络性能产生影响;
3.某些情况下也不能完全做到与下层网络无关,例如隧道封装会对网络的 MTU 限制产生影响。

Calico,就是非层叠网络:
Calico 网络的优势
1.没有隧道封装的网络开销;
2.相比于通过 Overlay 构成的大二层层叠网络,用 IBGP 构成的扁平三层网络扩展模式更符合传统
IP 网络的分布式结构;
3.不会对物理层网络的二层参数如 MTU 引入新的要求。
Calico 网络的劣势:
1.最大的问题是不容易支持多租户,由于没有封装,所有的虚拟机或者容器只能通过真实的 IP 来区
分自己,这就要求所有租户的虚拟机或容器统一分配一个地址空间;而在典型的支持多租户的网络环境
中,每个租户可以分配自己的私有网络地址,租户之间即使地址相同也不会有冲突;
2.不容易与其他基于主机路由的网络应用集成。
注:总结:calico 提供一种不同物理机之间,docker 容器高速通信的组网方案


6.4 k8s集群安装calico

6.4.1 安装calico

下载calico
注:在线下载配置文件地址是: https://docs.projectcalico.org/manifests/calico.yaml
安装calico

kubectl apply -f calico.yaml

kubectl get pod -n kube-system

 kubectl get node

6.4.2 calico 配置文件详解

Configmap 配置:
image.png
cni_network_config: |- //符合 CNI 规范的网络配置,其中 type=calico 表示,Kubelet 从 CNI_PATH(默认为/opt/cni/bin)找名为 calico 的可执行文件,用于容器的网络设置。
ipam 中的 type=calico-ipam 表示 kubelet 将在/opt/cni/bin 目录下搜索名为 calico-ipam 的可执行文件,用于完成容器 IP 地址的分配。
calico_backend:Calico 的后端,默认为 bird。
注意:BIRD 实际上是 BIRD Internet Routing Daemon 的缩写,是一款可运行在Linux 和其他类 Unix 系统上的路由软件,它实现了多种路由协议,比如 BGP、OSPF、RIP 等。
daemonset 配置
image.png
image.png
image.png
image.png
image.png
image.png
在该 Pod 中包括如下两个容器:
install-cni:在 Node 上安装 CNI 二进制文件到/opt/cni/bin 目录下,并安装相应的网络配置文件到/etc/cni/net.d 目录下,设置为 initContainers 并在运行完成后退出。
calico-node:Calico 服务程序,用于设置 Pod 的网络资源,保证 Pod 的网络与各 Node 互联互通。它还需要以 hostNetwork 模式运行,直接使用宿主机网络。
calico-node 服务的主要参数如下。
CALICO_IPV4POOL_CIDR:Calico IPAM 的 IP 地址池,Pod 的 IP 地址将从该池中进行分配。
CALICO_IPV4POOL_IPIP:是否启用 IPIP 模式。启用 IPIP 模式时,Calico 将在 Node 上创建一 个名为 tunl0 的虚拟隧道。
IP_AUTODETECTION_METHOD:获取 Node IP 地址的方式,默认使用第 1 个网络接口的 IP 地址,对于安装了多块网卡的 Node,可以使用正则表达式选择正确的网卡,例如”interface=eth.*”表示选择名称以 eth 开头的网卡的 IP 地址。
- name: IP_AUTODETECTION_METHOD
value: “interface=eth0”
FELIX_IPV6SUPPORT:是否启用 IPv6。
FELIX_LOGSEVERITYSCREEN:日志级别。
securityContext.privileged=true:以特权模式运行。
IP Pool 可以使用两种模式:BGP 或 IPIP。使用 IPIP 模式时,设置
CALICO_IPV4POOL_IPIP=”Always”,不使用 IPIP 模式时,设置
CALICO_IPV4POOL_IPIP=”Off”,此时将使用 BGP 模式。
calico 的 IPIP 模式和 BGP 模式对比分析
1) IPIP
把一个 IP 数据包又套在一个 IP 包里,即把 IP 层封装到 IP 层的一个 tunnel,它的作用其实基本上就相当于一个基于 IP 层的网桥,一般来说,普通的网桥是基于 mac 层的,根本不需要 IP,而这个 ipip 则是通过两端的路由做一个 tunnel,把两个本来不通的网络通过点对点连接起来;
calico 以 ipip 模式部署完毕后,node 上会有一个 tunl0 的网卡设备,这是 ipip 做隧道封装用的,也是一种 overlay 模式的网络。当我们把节点下线,calico 容器都停止后,这个设备依然还在,执行rmmodipip 命令可以将它删除。
2) BGP
边界网关协议(BorderGateway Protocol, BGP)是互联网上一个核心的去中心化的自治路由协议。它通过维护 IP 路由表或‘前缀’表来实现自治系统(AS)之间的可达性,属于矢量路由协议。BGP不使用传统的内部网关协议(IGP)的指标,而是基于路径、网络策略或规则集来决定路由。因此,它更适合被称为矢量性协议,而不是路由协议,通俗的说就是将接入到机房的多条线路(如电信、联通、移动等)融合为一体,实现多线单 IP;
BGP 机房的优点:服务器只需要设置一个 IP 地址,最佳访问路由是由网络上的骨干路由器根据路由
跳数与其它技术指标来确定的,不会占用服务器的任何系统;
官方提供的 calico.yaml 模板里,默认打开了 ip-ip 功能,该功能会在 node 上创建一个设备
tunl0,容器的网络数据会经过该设备被封装一个 ip 头再转发。这里,calico.yaml 中通过修改 calico
node 的环境变量:CALICO_IPV4POOL_IPIP 来实现 ipip 功能的开关:默认是 Always,表示开启;
Off 表示关闭 ipip
image.png
calico-kube-controllers 解析
image.png
image.png
用户在 Kubernetes 集群中设置了 Pod 的 Network Policy 之后,calico-kube-controllers 就会自动通知各 Node 上的 calico-node 服务,在宿主机上设置相应的 iptables 规则,完成 Pod 间网络访问策略的设置。
calico-node 在正常运行之后,会根据 CNI 规范,在/etc/cni/net.d/目录下生成如下文件和目录,并在/opt/cni/bin/目录下安装二进制文件 calico 和 calico-ipam,供 kubelet 调用。


6.5 kubeadmin 流程

6.5.1 流程图

image.png


6.5.2 kubeadm安装环境监测

kubeadm 在执行安装之前进行了相当细致的环境检测,下面看一看:
1) 检查执行 init 命令的用户是否为 root,如果不是 root,直接快速失败(fail fast);
2) 检查待安装的 k8s 版本是否被当前版本的 kubeadm 支持(kubeadm 版本 >= 待安装 k8s 版本);
3) 检查防火墙,如果防火墙未关闭,提示开放端口 10250;
4) 检查端口是否已被占用,6443(或你指定的监听端口)、10257、10259;
5) 检查文件是否已经存在,/etc/kubernetes/manifests/*.yaml;
6) 检查是否存在代理,连接本机网络、服务网络、Pod 网络,都会检查,目前不允许代理;
7) 检查容器运行时,使用 CRI 还是 Docker,如果是 Docker,进一步检查 Docker 服务是否已启动,是否设置了开机自启动;
8) 对于 Linux 系统,会额外检查以下内容:
8.1) 检查以下命令是否存在:crictl、ip、iptables、mount、nsenter、ebtables、ethtool、socat、tc、touch;
8.2) 检查 /proc/sys/net/bridge/bridge-nf-call-iptables、/proc/sys/net/ipv4/ip-forward 内容是否为 1;
8.3) 检查 swap 是否是关闭状态;
9) 检查内核是否被支持,Docker 版本及后端存储 GraphDriver 是否被支持; 对于 Linux 系统,还需检查 OS 版本和 cgroup 支持程度(支持哪些资源的隔离);
10) 检查主机名访问可达性;
11) 检查 kubelet 版本,要高于 kubeadm 需要的最低版本,同时不高于待安装的 k8s 版本;
12) 检查 kubelet 服务是否开机自启动;
13) 检查 10250 端口是否被占用;
14) 如果开启 IPVS 功能,检查系统内核是否加载了 ipvs 模块;
15) 对于 etcd,如果使用 Local etcd,则检查 2379 端口是否被占用,/var/lib/etcd/ 是否为空目录;如果使用 External etcd,则检查证书文件是否存在(CA、key、cert),验证etcd 服务版本是否符合要求;
16) 如果使用 IPv6,
检查 /proc/sys/net/bridge/bridge-nf-call-iptables、/proc/sys/net/ipv6/conf/default/forwarding 内容是否为 1;


6.5.3 安装前配置

1) 在 kube-system 命名空间创建 ConfigMap kubeadm-config,同时对其配置RBAC 权限;
2) 在 kube-system 命名空间创建 ConfigMap kubelet-config-,同时对其配置 RBAC 权限;
3) 为当前节点(Master)打标记:node-role.kubernetes.io/master=;
4) 为当前节点(Master)补充 Annotation;
5) 如果启用了 DynamicKubeletConfig 特性,设置本节点 kubelet 的配置数据源为ConfigMap 形式;
6) 创建 BootStrap token Secret,并对其配置 RBAC 权限;
7) 在 kube-public 命名空间创建 ConfigMap cluster-info,同时对其配置 RBAC 权限;
8) 与 apiserver 通信,部署 DNS 服务;
9) 与 apiserver 通信,部署 kube-proxy 服务;
10) 如果启用了 self-hosted 特性,将 Control Plane 转为 DaemonSet 形式运行;
11) 打印 join 语句;


6.5.4 Kubeadm 生成的 k8s 证书内容说明

证书分组
Kubernetes 把证书放在了两个文件夹中
/etc/kubernetes/pki
/etc/kubernetes/pki/etcd
Kubernetes 集群根证书
Kubernetes 集群根证书 CA(Kubernetes 集群组件的证书签发机构)
/etc/kubernetes/pki/ca.crt
/etc/kubernetes/pki/ca.key
以上这组证书为签发其他 Kubernetes 组件证书使用的根证书, 可以认为是 Kubernetes 集群中证书签发机构之一
由此根证书签发的证书有:
1) kube-apiserver apiserver 证书
/etc/kubernetes/pki/apiserver.crt
/etc/kubernetes/pki/apiserver.key
2) kubelet 客户端证书, 用作 kube-apiserver 主动向 kubelet 发起请求时的客户端认证
/etc/kubernetes/pki/apiserver-kubelet-client.crt
/etc/kubernetes/pki/apiserver-kubelet-client.key
kube-apiserver 代理根证书(客户端证书)
用在 requestheader-client-ca-file 配置选项中, kube-apiserver 使用该证书来验证客户端证书是否为自己所签发
/etc/kubernetes/pki/front-proxy-ca.crt
/etc/kubernetes/pki/front-proxy-ca.key
由此根证书签发的证书只有一组:
代理层(如汇聚层 aggregator)使用此套代理证书来向 kube-apiserver 请求认证代理端使用的客户端证书, 用作代用户与 kube-apiserver 认证
/etc/kubernetes/pki/front-proxy-client.crt
/etc/kubernetes/pki/front-proxy-client.key
etcd 集群根证书
etcd 集群所用到的证书都保存在/etc/kubernetes/pki/etcd 这路径下, 很明显, 这一套证书是用来专门给 etcd 集群服务使用的, 设计以下证书文件etcd 集群根证书 CA(etcd 所用到的所有证书的签发机构)
/etc/kubernetes/pki/etcd/ca.crt
/etc/kubernetes/pki/etcd/ca.key
由此根证书签发机构签发的证书有:
1) etcd server 持有的服务端证书
/etc/kubernetes/pki/etcd/server.crt
/etc/kubernetes/pki/etcd/server.key
2) peer 集群中节点互相通信使用的客户端证书
/etc/kubernetes/pki/etcd/peer.crt
/etc/kubernetes/pki/etcd/peer.key
注: Peer:对同一个 etcd 集群中另外一个 Member 的称呼
3) pod 中定义 Liveness 探针使用的客户端证书
kubeadm 部署的 Kubernetes 集群是以 pod 的方式运行 etcd 服务的, 在该 pod 的定义中, 配置了 Liveness 探活探针
/etc/kubernetes/pki/etcd/healthcheck-client.crt
/etc/kubernetes/pki/etcd/healthcheck-client.key
4)配置在 kube-apiserver 中用来与 etcd server 做双向认证的客户端证书
/etc/kubernetes/pki/apiserver-etcd-client.crt
/etc/kubernetes/pki/apiserver-etcd-client.key


6.6 安装 kubernetes-dashboard-2.0

将需要的镜像上传到每个工作节点

docker load -i dashboard_2_0_0.tar.gz

docker load -i metrics-scrapter-1-0-1.tar.gz

将yaml文件上传到master节点执行,验证
kubernetes-dashboard.yaml

kubectl apply -f kubernetes-dashboard.yaml

kubectl get pods -n kubernetes-dashboard

kubectl get svc -n kubernetes-dashboard

修改service type类型

kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard

把 type: ClusterIP 变成 type: NodePort,保存退出即可

kubectl get svc -n kubernetes-dashboard

访问页面
image.png
注: 输入thisisunsafe 即可
image.png
创建管理员token使用token登录

kubectl create clusterrolebinding dashboard-cluster-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:kubernetes-dashboard

kubectl get secret -n kubernetes-dashboard

kubectl describe secret

kubectl describe secret kubernetes-dashboard-token-tqkdj -n kubernetes-dashboard

image.png


6.7 通过kubernetes-dashboard 创建容器

创建容器
image.png
image.png
image.png
查看分配端口访问
image.png
访问测试
image.png


6.8 kubeadm 初始化 k8s 证书过期解决方案

查看过期时间

openssl x509 -in /etc/kubernetes/pki/ca.crt -noout -text |grep Not

传递脚本到所有节点执行
update-kubeadm-cert.sh

chmod +x update-kubeadm-cert.sh

./update-kubeadm-cert.sh all

验证执行

kubectl get pods -n kube-system

openssl x509 -in /etc/kubernetes/pki/apiserver-etcd-client.crt -noout -text |grep Not

七、Pod 理论-资源清单编写技巧

7.1 Pod 概述:概念、原理深度解读

7.1.1 Pod 概述

Pod 是 Kubernetes 中的最小调度单元,一个 Pod 封装一个容器(也可以封装多个容器),Pod 里的容器共享存储、网络等。也就是说,可以把整个 pod 看作虚拟机,然后每个容器相当于运行在虚拟机的进程。同一个 pod 里的所有容器都被统一安排和调度。
网络: 每一个 Pod 都会被指派一个唯一的 Ip 地址,在 Pod 中的每一个容器共享网络命名空间,包括 Ip 地址和网络端口。在同一个 Pod 中的容器可以通过 locahost 进行互相通信。当 Pod 中的容器需要与 Pod 外的实体进行通信时,则需要通过端口等共享的网络资源进行通信。
存储: 在 Pod 中所有的容器能够访问共享存储卷,允许这些容器共享数据。
总结:
1) pod 是 k8s 的最小调度单元,容器包含在 pod 中,一个 pod 中有一个 pause 容器和若干个业务容器,而容器是单独的一个容器,简而言之,pod 是一组容器的集合。
2) pod 相当于逻辑主机,每个 pod 都有自己的 ip 地址
3) pod 内的容器共享相同的 ip 和端口
4) 默认情况下,每个容器的文件系统与其他容器完全隔离
什么是 pause 容器?
每个 Pod 里运行着一个特殊的被称之为 Pause 的容器,其他容器则为业务容器,这些业务容器共享Pause 容器的网络栈和 Volume 挂载卷,因此他们之间通信和数据交换更为高效,在设计时我们可以充分利用这一特性将一组密切相关的服务进程放入同一个 Pod 中。同一个 Pod 里的容器之间仅需通过localhost 就能互相通信。
什么是容器?
容器的定义是一个包,其中有要执行的程序及其所有依赖关系,如操作系统、代码、运行时、系统库等,所有这些都绑定在一个盒子中。运行容器化的应用程序,尤其是在生产环境中,需要进行编排。这样就可以实现批量管理容器
什么是 pod?
在 Kubernetes 中部署应用时,都是以 pod 进行调度的,它们基本上是单个容器的包装或房子。从某种意义上说,容器的容器。 pod 是一个逻辑包装实体,用于在 K8s 集群上执行容器。可以把每个pod 想象成一个透明的包装,为容器提供一个插槽。pod 是 Kubernetes 最小的可部署单位。pod 是一组一个或多个容器,具有共享的存储/网络资源,以及如何运行容器的规范。因此,最简单地说,pod 是一个容器如何在 Kubernetes 中“用起来”的机制。
什么是集群?
集群是 Kubernetes 基本架构的核心:如果运行 Kubernetes,那么至少运行一个集群。它们有一种共生关系:容器→pod→集群。
——容器在 pod 中逻辑化地运行;
——集群上运行一组相关或无关的 pod。pod 是集群上的一个复制单元;
——一个集群可以包含许多相关或不相关的 pod,它们分组在被称为命名空间的逻辑边界里。
什么是节点?
它存在于 pod 和集群之间。在 Kubernetes 中,节点本质上是托管 pod 的机器,可以是物理机也可以是虚拟机


7.1.2 Pod 如何管理多个容器

什么时候 Pod 里需要运行多个容器?
在一个 Pod 中同时运行多个容器是一种比较高级的用法,只有当你的容器需要紧密配合协作的时候才考虑用这种模式。例如,一个 pod 里有两个容器,这两个容器是共享存储卷的,有一个容器作为 web 服务器运行,另一个容器是为了给 web 服务提供配置文件的,如下图所示:
image.png


7.1.3 自主式 Pod 和控制器管理的 Pod

创建一个pod进行测试

vim pod.yaml

kubectl apply -f pod.yaml

kubectl get pods -l tomcat=tomcat-pod
apiVersion: v1
kind: Pod
metadata:
  name: tomcat-pod
  namespace: default
  labels:
    tomcat: tomcat-pod
spec:
  containers:
  - name: tomcat-pod-java
    ports:
    - containerPort: 8080
    image: tomcat:8.5-jre8-alpine
    imagePullPolicy: IfNotPresent

上面定义的 pod.yaml 中,kind: Pod,这就是我们直接定义了一个 pod 资源,是自主式 pod,如果不小心删除这个pod

kubectl delete pod tomcat-pod

kubectl get pods | grep tomcat

上面可以看到,当我们把 pod 删除之后,pod 不会再创建一个新的 pod,而是彻底从机器上删除了,这种自主式 pod 适合测试环境或者非核心业务场景,如果生产环境用自主式 pod 部署应用,那是存在风险的,所以,我们平时管理 Pod 都是通过控制器去管理的。
控制器管理的 Pod
生产环境中可以用控制器创建和管理多个 pod。控制器在 pod 失败的情况下可以处理副本、更新以及自动修复。控制器可以指定创建的 pod 副本数,使其管理的 pod 始终维持在指定副本数量,例如,如果某个节点发生故障,则控制器会注意到该节点上的 Pod 已停止工作,并创建一个新的 pod 替换有故障的 Pod。调度程序将替换的 Pod 放置到健康的节点上。可以使用 Deployment、Statefulset、Daemonset、Job 等控制器管理 pod。
通过deployment创建一个pod资源

vim deployment.yaml

kubectl apply -f deployment.yaml

kubectl get deploy

kubectl get rs

kubectl get pods
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  selector:
    matchLabels:
      run: my-nginx
  replicas: 2
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:
        - containerPort: 80

通过上面可以看到创建 deployment 资源的时候,会生成 rs 和相应的 pod
#假如误删除其中一个 pod

 kubectl delete pods my-nginx-cf54cdbf7-5l2d4

 kubectl get pods

通过上面可以发现,删除一个 pod 之后,控制器会自动生成一个新的 pod my-nginx-5b56ccd65f-wd6wv


7.2 Pod 资源清单的定义方法

7.2.1 pod 资源清单配置基础

在创建 pod 的时候,可以通过写资源清单文件去实现,Pod 资源清单文件中有几个重要的部分:apiVersion、kind、metadata、spec 以及 status,其中 apiVersion 和 kind 是比较固定的,status是运行时的状态,所以最重要的就是 metadata 和 spec 两个部分。
查看 pod 中包含哪些字段,可以使用 kubectl explain 命令:

 kubectl explain pod

apiVersion: v1
#属性名:属性值;
v1 表示我们定义的 pod 属于 k8s 哪个 api 的版本,一般 apiversion 后面的值的格式是group/version,如果 group 省略,表示的是核心组,那么 v1 就是核心组上的资源。
kind: Pod
资源类别,告诉我们要创建的资源是什么?如创建一个 pod 还是 service 还是 deployment,这些不能随意定义,一般都是内建的或者用户自定义的,需要按照固定语法格式自定义。
metadata
元数据,元数据下面可以定义资源的名字和所属的名称空间,以及资源的标签,通过 kubectl explain 可以查看元数据下的所有子字段
spec(非常重要的字段)
spec 中需要嵌套很多二级字段和三级字段,不同的资源类型 spec 需要嵌套的字段各不相同如果某个字段的标题属性是 require(必选字段),剩下的都是可选字段,我们系统会给它赋予默认值,不同的资源类型 spec 值是各不相同的,它是用户定义的期望状态。
status:
表示当前状态,有些内嵌的字段是系统自主维护的,只读属性,不能修改
查看 k8s 中有哪些 api 版本

 kubectl api-versions
admissionregistration.k8s.io/v1
apiextensions.k8s.io/v1
apiregistration.k8s.io/v1
apps/v1
authentication.k8s.io/v1
authorization.k8s.io/v1
autoscaling/v1
autoscaling/v2
autoscaling/v2beta1
autoscaling/v2beta2
batch/v1
batch/v1beta1
certificates.k8s.io/v1
coordination.k8s.io/v1
crd.projectcalico.org/v1
discovery.k8s.io/v1
discovery.k8s.io/v1beta1
events.k8s.io/v1
events.k8s.io/v1beta1
flowcontrol.apiserver.k8s.io/v1beta1
flowcontrol.apiserver.k8s.io/v1beta2
networking.k8s.io/v1
node.k8s.io/v1
node.k8s.io/v1beta1
policy/v1
policy/v1beta1
rbac.authorization.k8s.io/v1
scheduling.k8s.io/v1
storage.k8s.io/v1
storage.k8s.io/v1beta1
v1

上面可以显示 api 的群组和版本,v1 是核心群组,同一个组有很多版本,为什么分组进行管理:如果不分组,那么更新一个,所有的都更新,分组以后,某一个组改变了,只需要改变一个,其他不需要改变;分组之后同一个群组的不通版本号也能并存;因为 pod 是最核心的资源,因此属于核心群组 v1。控制器 deployment,replicatset 都属于应用程序广义的核心资源,属于 apps/v1 这个群组
注:k8s 以前版本,apps 有下面三个版本,v1 是表示测试通过的稳定版本,beta1 是公测版本,不稳定,有可能被改变,alpha 内测版本,不能使用


7.2.2 资源清单文件编写技巧

在 k8s 中,我们通过资源清单文件定义资源,需要写很多字段,那这些字段需要我们都记住吗?当然不需要,我们可以通过帮助命令 kubectl explain 查看具体字段如何定义:
#查看 pod 资源包含哪些字段

kubectl explain pods

KIND: Pod
VERSION: v1
DESCRIPTION:
Pod is a collection of containers that can run on a host. This resource is created by clients and scheduled onto hosts.
FIELDS:
apiVersion
kind
metadata #object 表示对象,那么说明里面会嵌套很多字段
spec
status #状态,不需要我们定义
#查看 pod 资源中 metadata 字段如何定义

kubectl explain pods.metadata

KIND: Pod
VERSION: v1
RESOURCE: metadata
DESCRIPTION:
Standard object’s metadata. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
ObjectMeta is metadata that all persisted resources must have, which includes all objects users must create.
FIELDS:
annotations
# 表示 key 和 value 都是 String 类型的,map 表示由众多的 key/value 组成一 个 json 数组
clusterName
creationTimestamp
deletionGracePeriodSeconds
deletionTimestamp
finalizers <[]string>
generateName
generation
labels
managedFields <[]Object>
#对象列表,每个对象可以有多个字段,这个对象也可以有多个
name #资源名字
namespace #资源所在的名称空间
ownerReferences <[]Object>
resourceVersion
selfLink
uid
#查看 pod 资源中 spec 字段如何定义

kubectl explain pods.spec

KIND: Pod
VERSION: v1
RESOURCE: spec
DESCRIPTION:
Specification of the desired behavior of the pod. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api
conventions.md#spec-and-status
PodSpec is a description of a pod.
FIELDS:
activeDeadlineSeconds #整数
affinity
automountServiceAccountToken
containers <[]Object> -required-
dnsConfig
dnsPolicy
enableServiceLinks
ephemeralContainers <[]Object>
hostAliases<[]Object>
hostIPC
hostNetwork
hostPID
hostname
imagePullSecrets <[]Object>
initContainers <[]Object>
nodeName
nodeSelector
overhead
preemptionPolicy
priority
priorityClassName
readinessGates <[]Object>
restartPolicy
runtimeClassName
schedulerName
securityContext
serviceAccount
setHostnameAsFQDN
shareProcessNamespace
subdomain
terminationGracePeriodSeconds
tolerations <[]Object>
topologySpreadConstraints <[]Object>
volumes <[]Object>
#查看 pod 资源中 spec.containers 字段如何定义

kubectl explain pods.spec.containers


KIND: Pod
VERSION: v1
RESOURCE: containers <[]Object>
DESCRIPTION:
List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
A single application container that you want to run within a pod.
FIELDS:
args <[]string>
command <[]string>
env <[]Object>
envFrom <[]Object>
image
imagePullPolicy
lifecycle
livenessProbe
name -required-
ports <[]Object>
readinessProbe
resources
securityContext
startupProbe
stdin
stdinOnce
terminationMessagePath
terminationMessagePolicy
tty
volumeDevices <[]Object>
volumeMounts <[]Object>
workingDir
#查看 pod 资源中 spec.container.ports 字段如何定义

kubectl explain pods.spec.containers.ports

KIND: Pod
VERSION: v1
RESOURCE: ports <[]Object>
DESCRIPTION:
List of ports to expose from the container. Exposing a port here gives the system additional information about the network connections a container uses, but is primarily informational. Not specifying a port here DOES NOT prevent that port from being exposed. Any port which is listening on the default “0.0.0.0” address inside a container will be accessible from the network. Cannot be updated.
ContainerPort represents a network port in a single container.
FIELDS:
containerPort -required-
hostIP
hostPort
name
protocol


7.3 创建一个pod资源

编写一个yaml文件

 vim demo-pod.yaml
apiVersion: v1 #pod 属于 k8s 核心组 v1
kind: Pod #创建的是一个 Pod 资源
metadata: #元数据
  name: demo-pod #pod 名字
  namespace: default #pod 所属的名称空间
  labels:
    app: myapp #pod 具有的标签
    env: dev #pod 具有的标签
spec:
  containers: #定义一个容器,容器是对象列表,下面可以有多个 name
  - name: tomcat-pod-java #容器的名字
    ports:
    - containerPort: 8080 #容器内部的 port
    image: tomcat:8.5-jre8-alpine #容器使用的镜像
    imagePullPolicy: IfNotPresent #镜像拉取策略
  - name: busybox
    image: busybox:latest
    command: #command 是一个列表,定义的时候下面的参数加横线
    - "/bin/sh"
    - "-c"
    - "sleep 3600"

更新文件

kubectl apply -f demo-pod.yaml

kubectl get pods -l app=myapp

#查看 pod 的详细信息
kubectl describe pods demo-pod

#查看 pod 中 busybox 容器的日志
kubectl logs demo-pod -c busybox

#以交互式形式进入到 pod 中的 busybox 容器
kubectl exec -it demo-pod -c busybox -- /bin/sh

删除定义的资源

# 方法一
kubectl delete -f demo-pod.yaml

# 方法二
kubectl delete pods demo-pod

7.4 pod.spec 字段详解

 kubectl explain pods.spec

FIELDS:
containers <[]Object> -required-
List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
#containers 是一个对象列表,下面的值是由多个 key:value 键值对组成的

kubectl explain pods.spec.containers

KIND: Pod
VERSION: v1
RESOURCE: containers <[]Object>
DESCRIPTION:
List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
A single application container that you want to run within a pod.
FIELDS:
args <[]string>
command <[]string>
image
imagePullPolicy
Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. Cannot be updated.
More info:
https://kubernetes.io/docs/concepts/containers/images#updating-images
name -required- #容器的名字
imagePullPolicy:
#镜像拉取策略,有下面几种方式:
Always,Never,IfNotPresent
Always:无论本地有没有镜像,都是是到 docker hub 仓库下载
Never:本地有镜像就用本地的,本地没有镜像就等着,不从仓库下载
IfNotPresent:本地有镜像,就使用本地的,本地没有镜像就从仓库下载

kubectl explain pods.spec.containers.ports

KIND: Pod
VERSION: v1
RESOURCE: ports <[]Object>
DESCRIPTION:
FIELDS:
containerPort -required-
hostIP
hostPort
name
protocol
ports 是一个对象列表,定义容器内需要暴露的端口时,可以是多个,而且每一个端口还应该有多个 属性来定义,比如说端口的名称(给端口起个名字,接下来可以根据端口名称去引用它)。端口号,协议,暴露端口号只是给系统添加一个额外信息的,并不一定是真的暴露;如果我们定义这个字段,客户可以知道我们的应用程序对外提供的端口是哪些,就算是这没指定暴露的端口,pod 中的应用程序的端口也是暴露出去的。
在 ports 字段下可以配置上面几个端口,一般我们只需要指定 containerPort(容器端口)即可;有必要的话也可以指定 name(由字母,数字或者下划线组成);protocol(默认协议是 tcp);hostIP(绑定节点 ip,这个一般不需要指定,因为 pod 调度到哪个节点我们不确定的,就没法指定节点 ip,如果要指定,只需要指定 0.0.0.0 这个 ip 即可)
举例子说明:
ports:
- name: http
containerPort:80
- name:https
containerPort:443
#ports 是一个列表对象,下面可以有多个值,下面的值不需要缩进,前面加上-即可

 kubectl explain pods.spec.containers.command

KIND: Pod
VERSION: v1
FIELD: command <[]string>
command :表示要运行的程序,但是不提供/bin/sh 命令行,需要我们自己指定,如果只定义了 command,而没有定义 args,那么只运行 command 后面的指令,而不运行 entrypoint,cmd 等指令;如果没有提供 command,而我们 docker 镜像在制作时有 entrypoint 指定,那么它就会运行镜像里的 entrypoint 的命令
command:
- “/bin/sh” #这个就是自己指定的/bin/sh
- “-c”
- “sleep 3600”

kubectl explain pods.spec.containers.args

KIND: Pod
VERSION: v1
FIELD: args <[]string>
args:表示传递参数的,如果没有给 args,而我们镜像中又有 entrypoint 指令,又有 cmd 指令,那么镜像自己 cmd 指令给定的参数将作为参数传递给 entrypoint 后面的代码;如果给定了 args,那么我们镜像中 cmd 字段后面所指定的参数将不再作为参数传递,而传递的参数是 args 中所定义的内容。

https://kubernetes.io/docs/tasks/inject-data-application/define-command-argumentcontainer
这个链接里列举了 command 和 args 的关系,相关的关系图如下:
image.png
1:如果只定义了镜像里的 entrypoint,镜像里的 cmd,那么我们运行 pod 容器时使用的命令和参数就是镜像里的命令和参数
2:如果我们定义了容器的 command,那么我们 pod 使用的命令就是 command 后面的命令
3:如果我们只定义了 args 参数,那么我们 pod 使用的命令就是镜像的 entrypoint+args 参数
4:如果我们定义了 command 的命令和 args 的参数,那么我们运行 pod 时使用的就是command 命令+args 参数


7.5 label 标签使用技巧

7.5.1 pod 资源标签使用技巧

在 k8s 之上,每一种资源都可以有一个标签,现实中用到的 pod 数量可能越来越多,我们期望能够分类进行管理,最简单和直接的效果就是把 pod 分成很多不同的小组,无论对于开发还是运维来讲都能显著提高管理效率,更何况我们控制器,我们 service 资源也需要使用标签来识别它们所管控或关联到的资源,当我们给 pod 或者任何资源设定标签时,都可以使用标签查看,删除等对其执行相应的管理操作;
简单来说所谓的标签就是附加在我们所对应的对象之上的键值对,一个资源之上可以存在多个标签,每个标签都是键值对(key: value),而且每个标签都可以被标签选择器进行匹配度检查从而完成资源挑选,通常情况下一个资源对象可使用多个标签,反之,一个标签也可以被添加到多个资源对像上;标签既可以在资源对象创建的时候指定,也可以在资源对象创建之后使用命令来进行管理,这个管理既包括添加,也包括删除,还包括修改;实践中我们通常给资源附加不同维度的标签,来进行不同维度的管理,比方说 labels 下面的标签 app: myapp 用来指明当前的应用程序(可能是 nginx,tomcat,http,mysql,redis 等)是什么,我们也可以分层打标签,如前端 frontend,后端 backend,开发环境 dev
等;也可以对版本打标签。
image.png
标签:
key: value (key 和 value 最多 63 个字符,key 只能使用字母,数字,下划线组成,只能以字母或下划线开头,不能为空值;value 可以为空,也是只能以字母或下划线开头,中间可使用字母,数字,下划线)

# 查看所有 pod 资源对象的标签
kubectl get pods --show-labels

# 查看所有资源对象下拥有 run 这个标签的标签值
kubectl get pods -L run

# 查看拥有 run 这个标签的 pod 资源对象
kubectl get pods -l run

# 查看拥有 run 这个标签的资源对象,并且把标签显示出来
kubectl get pods -l run --show-labels

想修改资源的标签,比方说想给 my-nginx-5b56ccd65f-thnqt 加上个 release 标签给资源对象打标签要使用 label 命令,指定给某个类型下的某个资源打标签,key/value 可以是多个,因此在 my-nginx-5b56ccd65f-thnqt 这个资源下再打个标签 release,用如下命令

 kubectl label pods my-nginx-5b56ccd65f-thnqt release=canary

 kubectl get pods -l release --show-labels

 # 修改标签,把 release=canary 改成 release=stable
 kubectl label pods my-nginx-5b56ccd65f-thnqt release=stable --overwrite

# 查看既有 release 标签,又有 run 标签的 pod 资源有哪些
kubectl label pod -l release,run

7.5.2 给 node 节点打标签

# 查看 nodes 节点的标签
kubectl get nodes --show-labels

# 给 node 节点打标签
 kubectl label nodes xuegod64 xuegod=IT

# 查看 node 节点标签
kubectl get nodes --show-labels

# 删除指定标签
 kubectl label nodes xuegod64 xuegod-

7.6 名称空间 namespace

namespace 是 k8s 集群级别的资源,可以给不同的用户、租户、环境或项目创建对应的名称空间,例如,可以为 development、qa、和 production 应用环境分别创建各自的名称空间。
k8s 的绝大多数资源都属于名称空间级别,同一名称空间内的同一类资源名必须是唯一的,但跨名称空间时并无此限制。k8s 还有一些资源属于集群级别的,如 node、namespace、PersistentVolume 等资源,它们不属于任何名称空间,因此资源对象的名称必须全局唯一。


7.6.1 查看名称空间及其资源对象

k8s 集群默认提供了几个名称空间用于特定目的,例如,kube-system 主要用于运行系统级资源,而 default 则为那些未指定名称空间的资源操作提供一个默认值。
使用 kubectl get namespace 或者 kubectl get ns 可以查看 namespace 资源,使用 kubectl describe namespace $NAME 可以查看特定的名称空间的详细信息。


7.6.2 管理 namespace 资源

namespace 是 k8s api 的标准资源类型之一,namespace 资源属性较少,通常只需要指定名称即可创建,如“kubectl create namespace qa”。namespace 资源的名称仅能由字母、数字、下划线、连接线等字符组成。删除 namespace 资源会级联删除其包含的所有其他资源对象:
命令格式 功能
kubectl delete TYPE RESOURCE -n NS 删除指定名称空间内的指定资源
kubectl delete TYPE —all -n NS 删除指定名称空间内的指定类型的所有资源
kubectl delete all -n -NS 删除指定名称空间内的所有资源
kubectl delete all —all 删除所有名称空间中的所有资源
注意:namespace 对象仅用于资源对象名称的隔离,它自身并不能隔绝跨名称空间的 pod 间通信,那是网络策略资源的功能


八、Pod 高级实战-亲和性反亲和性-污点和容忍度

8.1 node 节点选择器

我们在创建 pod 资源的时候,pod 会根据 scheduler 进行调度,那么默认会调度到随机的一个工作节点,如果我们想要 pod 调度到指定节点或者调度到一些具有相同特点的 node 节点,怎么办呢?
可以使用 pod 中的 nodeName 或者 nodeSelector 字段指定要调度到的 node 节点


8.1.1 nodeName

指定 pod 节点运行在哪个具体 node 上
编写一个yaml指定node节点

vim pod-node.yaml
apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
  namespace: default
  labels:
    app: myapp
    env: dev
spec:
  containers:
  - name: tomcat-pod-java
    ports:
    - containerPort: 8080
    image: tomcat
    imagePullPolicy: IfNotPresent
  - name: busybox
    image: busybox
    command:
    - "/bin/sh"
    - "-c"
    - "sleep 3600"
  nodeName: k8s-slave1

加载yaml文化,并验证

kubectl apply -f pod-node.yaml

kubectl get pod -o wide

8.1.2 nodeSelector

指定 pod 调度到具有哪些标签的 node 节点上
给节点打上标签

kubectl get node --show-labels

kubectl label node k8s-slave1 name=k8s-slave1

kubectl get node k8s-slave1 --show-labels

编写yaml文件

vim pod-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: demo-pod-1
  namespace: default
  labels:
    app: myapp
    env: dev
spec:
  nodeSelector:
    name: k8s-slave1
  containers:
  - name: tomcat-pod-java
    ports:
    - containerPort: 8080
    image: tomcat
    imagePullPolicy: IfNotPresent

执行yaml文件并验证

kubectl apply -f pod-1.yaml

kubectl get pod -o wide

8.2 node节点亲和性

node 节点亲和性调度:nodeAffinity

8.2.1 字段解读

字段解读

 kubectl explain pods.spec.affinity
KIND:     Pod
VERSION:  v1

RESOURCE: affinity <Object>

DESCRIPTION:
     If specified, the pod's scheduling constraints

     Affinity is a group of affinity scheduling rules.

FIELDS:
   nodeAffinity    <Object>
     Describes node affinity scheduling rules for the pod.

   podAffinity    <Object>
     Describes pod affinity scheduling rules (e.g. co-locate this pod in the
     same node, zone, etc. as some other pod(s)).

   podAntiAffinity    <Object>
     Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
     in the same node, zone, etc. as some other pod(s)).
kubectl explain pods.spec.affinity.nodeAffinity
KIND: Pod
VERSION: v1
RESOURCE: nodeAffinity <Object>
DESCRIPTION:
   Describes node affinity scheduling rules for the pod.
   Node affinity is a group of node affinity scheduling rules.
FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution <[]Object>
   requiredDuringSchedulingIgnoredDuringExecution  <Object>

prefered 表示有节点尽量满足这个位置定义的亲和性,这不是一个必须的条件,软亲和性
require 表示必须有节点满足这个位置定义的亲和性,这是个硬性条件,硬亲和性

kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND:     Pod
VERSION:  v1

RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <Object>

DESCRIPTION:
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to an update), the system may or may not try
     to eventually evict the pod from its node.

     A node selector represents the union of the results of one or more label
     queries over a set of nodes; that is, it represents the OR of the selectors
     represented by the node selector terms.

FIELDS:
   nodeSelectorTerms    <[]Object> -required-
     Required. A list of node selector terms. The terms are ORed.
kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms
KIND:     Pod
VERSION:  v1

RESOURCE: nodeSelectorTerms <[]Object>

DESCRIPTION:
     Required. A list of node selector terms. The terms are ORed.

     A null or empty node selector term matches no objects. The requirements of
     them are ANDed. The TopologySelectorTerm type implements a subset of the
     NodeSelectorTerm.

FIELDS:
   matchExpressions    <[]Object>
     A list of node selector requirements by node's labels.

   matchFields    <[]Object>
     A list of node selector requirements by node's fields.

matchExpressions:匹配表达式的
matchFields: 匹配字段的

kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchFields
KIND:     Pod
VERSION:  v1

RESOURCE: matchFields <[]Object>

DESCRIPTION:
     A list of node selector requirements by node's fields.

     A node selector requirement is a selector that contains values, a key, and
     an operator that relates the key and values.

FIELDS:
   key    <string> -required-
     The label key that the selector applies to.

   operator    <string> -required-
     Represents a key's relationship to a set of values. Valid operators are In,
     NotIn, Exists, DoesNotExist. Gt, and Lt.



   values    <[]string>
     An array of string values. If the operator is In or NotIn, the values array
     must be non-empty. If the operator is Exists or DoesNotExist, the values
     array must be empty. If the operator is Gt or Lt, the values array must
     have a single element, which will be interpreted as an integer. This array
     is replaced during a strategic merge patch.
kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions
KIND:     Pod
VERSION:  v1

RESOURCE: matchExpressions <[]Object>

DESCRIPTION:
     A list of node selector requirements by node's labels.

     A node selector requirement is a selector that contains values, a key, and
     an operator that relates the key and values.

FIELDS:
   key    <string> -required-
     The label key that the selector applies to.

   operator    <string> -required-
     Represents a key's relationship to a set of values. Valid operators are In,
     NotIn, Exists, DoesNotExist. Gt, and Lt.



   values    <[]string>
     An array of string values. If the operator is In or NotIn, the values array
     must be non-empty. If the operator is Exists or DoesNotExist, the values
     array must be empty. If the operator is Gt or Lt, the values array must
     have a single element, which will be interpreted as an integer. This array
     is replaced during a strategic merge patch.

key:检查 label
operator:做等值选则还是不等值选则
values:给定值


8.2.2 node节点硬亲和性

编写yaml文件

vim  pod-nodeaffinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name:  pod-node-affinity-demo
  namespace: default
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: tomcat
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values:
            - foo
            - bar

执行yaml文件,并验证

kubectl apply -f pod-nodeaffinity-demo.yaml

kubectl get pods -o wide | grep pod-node

status 的状态是 pending,上面说明没有完成调度,因为没有一个拥有 zone 的标签的值是 foo 或者 bar,而且使用的是硬亲和性,必须满足条件才能完成调度

kubectl label nodes k8s-slave2 zone=foo

kubectl get pods -o wide

8.2.3 node节点软亲和性

编写yaml文件

vim pod-nodeaffinity-demo-2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-node-affinity-demo-2
  namespace:  default
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: tomcat
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: zone1
            operator: In
            values:
            - foo1
            - bar1
        weight: 60

注意:yaml 文件解释说明:
PreferredDuringSchedulingIgnoredDuringExecution:强调优先满足指定规则,调度器会尝试调度 Pod 到 Node 上,但并不强求,相当于软限制。多个优先级规则还可以设置权重(weight)值,以定义执行的先后顺序。
执行yaml文件并测试

kubectl apply -f pod-nodeaffinity-demo-2.yaml

kubectl get pods -o wide |grep demo-2

上面说明软亲和性是可以运行这个 pod 的,尽管没有运行这个 pod 的节点定义的 zone1 标签


8.3 Pod节点亲和性

8.3.1 字段解读

表现形式
pod 自身的亲和性调度有两种表示形式
podaffinity:pod 和 pod 更倾向腻在一起,把相近的 pod 结合到相近的位置,如同一区域,同一机架,这样的话 pod 和 pod 之间更好通信,比方说有两个机房,这两个机房部署的集群有 1000 台主机,那么我们希望把 nginx 和 tomcat 都部署同一个地方的 node 节点上,可以提高通信效率;
podunaffinity:pod 和 pod 更倾向不腻在一起,如果部署两套程序,那么这两套程序更倾向于反亲和性,这样相互之间不会有影响。
第一个 pod 随机选则一个节点,做为评判后续的 pod 能否到达这个 pod 所在的节点上的运行方式,这就称为 pod 亲和性;我们怎么判定哪些节点是相同位置的,哪些节点是不同位置的;我们在定义pod 亲和性时需要有一个前提,哪些 pod 在同一个位置,哪些 pod 不在同一个位置,这个位置是怎么定义的,标准是什么?以节点名称为标准,这个节点名称相同的表示是同一个位置,节点名称不相同的表示不是一个位置。
字段解读

kubectl explain pod.spec.affinity.podAffinity
KIND:     Pod
VERSION:  v1

RESOURCE: podAffinity <Object>

DESCRIPTION:
     Describes pod affinity scheduling rules (e.g. co-locate this pod in the
     same node, zone, etc. as some other pod(s)).

     Pod affinity is a group of inter pod affinity scheduling rules.

FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution    <[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     affinity expressions specified by this field, but it may choose a node that
     violates one or more of the expressions. The node that is most preferred is
     the one with the greatest sum of weights, i.e. for each node that meets all
     of the scheduling requirements (resource request, requiredDuringScheduling
     affinity expressions, etc.), compute a sum by iterating through the
     elements of this field and adding "weight" to the sum if the node has pods
     which matches the corresponding podAffinityTerm; the node(s) with the
     highest sum are the most preferred.

   requiredDuringSchedulingIgnoredDuringExecution    <[]Object>
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to a pod label update), the system may or
     may not try to eventually evict the pod from its node. When there are
     multiple elements, the lists of nodes corresponding to each podAffinityTerm
     are intersected, i.e. all terms must be satisfied.

requiredDuringSchedulingIgnoredDuringExecution: 硬亲和性
preferredDuringSchedulingIgnoredDuringExecution:软亲和性

kubectl explain pod.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND:     Pod
VERSION:  v1

RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <[]Object>

DESCRIPTION:
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to a pod label update), the system may or
     may not try to eventually evict the pod from its node. When there are
     multiple elements, the lists of nodes corresponding to each podAffinityTerm
     are intersected, i.e. all terms must be satisfied.

     Defines a set of pods (namely those matching the labelSelector relative to
     the given namespace(s)) that this pod should be co-located (affinity) or
     not co-located (anti-affinity) with, where co-located is defined as running
     on a node whose value of the label with key <topologyKey> matches that of
     any node on which a pod of the set of pods is running

FIELDS:
   labelSelector    <Object>
     A label query over a set of resources, in this case pods.

   namespaceSelector    <Object>
     A label query over the set of namespaces that the term applies to. The term
     is applied to the union of the namespaces selected by this field and the
     ones listed in the namespaces field. null selector and null or empty
     namespaces list means "this pod's namespace". An empty selector ({})
     matches all namespaces. This field is beta-level and is only honored when
     PodAffinityNamespaceSelector feature is enabled.

   namespaces    <[]string>
     namespaces specifies a static list of namespace names that the term applies
     to. The term is applied to the union of the namespaces listed in this field
     and the ones selected by namespaceSelector. null or empty namespaces list
     and null namespaceSelector means "this pod's namespace"

   topologyKey    <string> -required-
     This pod should be co-located (affinity) or not co-located (anti-affinity)
     with the pods matching the labelSelector in the specified namespaces, where
     co-located is defined as running on a node whose value of the label with
     key topologyKey matches that of any node on which any of the selected pods
     is running. Empty topologyKey is not allowed.

topologyKey:
位置拓扑的键,这个是必须字段
怎么判断是不是同一个位置:
rack=rack1
row=row1
使用 rack 的键是同一个位,使用 row 的键是同一个位置
labelSelector:
我们要判断 pod 跟别的 pod 亲和,跟哪个 pod 亲和,需要靠 labelSelector,通过 labelSelector 选则一组能作为亲和对象的 pod 资源
namespace:pod 所在的名称空间

kubectl explain pod.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution.labelSelector
KIND:     Pod
VERSION:  v1

RESOURCE: labelSelector <Object>

DESCRIPTION:
     A label query over a set of resources, in this case pods.

     A label selector is a label query over a set of resources. The result of
     matchLabels and matchExpressions are ANDed. An empty label selector matches
     all objects. A null label selector matches no objects.

FIELDS:
   matchExpressions    <[]Object>
     matchExpressions is a list of label selector requirements. The requirements
     are ANDed.

   matchLabels    <map[string]string>
     matchLabels is a map of {key,value} pairs. A single {key,value} in the
     matchLabels map is equivalent to an element of matchExpressions, whose key
     field is "key", the operator is "In", and the values array contains only
     "value". The requirements are ANDed.

8.3.2 pod 节点亲和性

编写yaml
定义两个 pod,第一个 pod 做为基准,第二个 pod 跟着它走

vim pod-required-affinity-demo-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: nginx
 vim pod-required-affinity-demo-2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: backend
    tier: db
spec:
  containers:
  - name: busybox
    image:  busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
        topologyKey: kubernetes.io/hostname

执行编写的yaml并测试

kubectl apply -f pod-required-affinity-demo-1.yaml

kubectl apply -f pod-required-affinity-demo-2.yaml

上面说明第一个 pod 调度到哪,第二个 pod 也调度到哪,这就是 pod 节点亲和性


8.3.3 pod 节点反亲和性

定义两个 pod,第一个 pod 做为基准,第二个 pod 跟它调度节点相反
编写yaml

vim pod-required-anti-affinity-demo-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first-1
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
  image: ikubernetes/myapp:v1
vim  pod-required-anti-affinity-demo-2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-second-1
  labels:
   app: backend
   tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
        topologyKey: kubernetes.io/hostname

执行yaml并测试

kubectl apply -f pod-required-anti-affinity-demo-1.yaml

kubectl apply -f pod-required-anti-affinity-demo-2.yaml

kubectl get pods -l app -owide

显示两个 pod 不在一个 node 节点上,这就是 pod 节点反亲和性


8.3.4 更改topologykey测试

更改节点标签

 kubectl label nodes k8s-slave1 zone=foo

 kubectl label nodes k8s-slave2 zone=foo --overwrite

编写yaml文件

vim pod-required-anti-affinity-demo-3.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first-3
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
  image: ikubernetes/myapp:v1
vim pod-required-anti-affinity-demo-4.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-second-3
  labels:
    app: backend
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
        topologyKey: zone

执行yaml文件并测试

kubectl apply -f pod-required-anti-affinity-demo-3.yaml

kubectl apply -f pod-required-anti-affinity-demo-4.yaml

kubectl get pods -l app -o wide

第二个节点现是 pending,因为两个节点是同一个位置,现在没有不是同一个位置的了,而且我们要求反亲和性,所以就会处于 pending 状态,如果在反亲和性这个位置把 required 改成 preferred,那么也会运行。
podaffinity:pod 节点亲和性,pod 倾向于哪个 pod
nodeaffinity:node 节点亲和性,pod 倾向于哪个 node


8.4 Pod 高级用法:污点和容忍度

8.4.1 字段解读

给了节点选则的主动权,我们给节点打一个污点,不容忍的 pod 就运行不上来,污点就是定义在节点上的键值属性数据,可以定决定拒绝那些 pod;
taints 是键值数据,用在节点上,定义污点;
tolerations 是键值数据,用在 pod 上,定义容忍度,能容忍哪些污点
节点亲和性是 pod 属性,pod 亲和性也是 pod 属性;但是污点是节点的属性,污点定义在 node上

kubectl explain node.spec.taints
KIND:     Node
VERSION:  v1

RESOURCE: taints <[]Object>

DESCRIPTION:
     If specified, the node's taints.

     The node this Taint is attached to has the "effect" on any pod that does
     not tolerate the Taint.

FIELDS:
   effect    <string> -required-
     Required. The effect of the taint on pods that do not tolerate the taint.
     Valid effects are NoSchedule, PreferNoSchedule and NoExecute.



   key    <string> -required-
     Required. The taint key to be applied to a node.

   timeAdded    <string>
     TimeAdded represents the time at which the taint was added. It is only
     written for NoExecute taints.

   value    <string>
     The taint value corresponding to the taint key.

taints 的 effect 用来定义对 pod 对象的排斥等级(效果):
NoSchedule:
仅影响调度过程,当 pod 能容忍这个节点污点,就可以调度到当前节点,后来这个节点的污点改了,加了一个新的污点,使得之前调度的 pod 不能容忍了,那这个 pod 会怎么处理,对现存的 pod 对象不产生影响
NoExecute:
既影响调度过程,又影响现存的 pod 对象,如果现存的 pod 不能容忍节点后来加的污点,这个pod 就会被驱逐
PreferNoSchedule:
最好不,也可以,是 NoSchedule 的柔性版本

在 pod 对象定义容忍度的时候支持两种操作:
1.等值密钥:key 和 value 上完全匹配
2.存在性判断:key 和 effect 必须同时匹配,value 可以是空
在 pod 上定义的容忍度可能不止一个,在节点上定义的污点可能多个,需要琢个检查容忍度和污点能否匹配,每一个污点都能被容忍,才能完成调度,如果不能容忍怎么办,那就需要看 pod 的容忍度了


8.4.2 管理节点污点

给随便一个节点设置污点

kubectl taint node k8s-slave1 node-type=production:NoSchedule

编写yaml文件

vim pod-taint.yaml
apiVersion: v1
kind: Pod
metadata:
  name: taint-pod
  namespace: default
  labels:
    tomcat: tomcat-pod
spec:
  containers:
  - name: taint-pod
    ports:
    - containerPort: 8080
    image: tomcat
    imagePullPolicy: IfNotPresent

查看pod调度情况

kubectl get pods -o wide

给另外一个节点也打上污点,会发现pod会被赶走

kubectl delete -f pod-taint.yaml

kubectl taint node xuegod64 node-type=dev:NoExecute

kubectl apply -f pod-taint.yaml

kubectl get pods -o wide

8.4.3 编写pod容忍

vim pod-demo-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: myapp-deploy
  namespace: default
  labels:
    app: myapp
    release: canary
spec:
  containers:
  - name: myapp
    image: http
    ports:
    - name: http
      containerPort: 80
  tolerations:
  - key: "node-type"
    operator: "Equal"
    value: "production"
    effect: "NoExecute"
    tolerationSeconds: 3600

注意: 字段解释说明
一般情况下,如果一个 effect 为 NoExecute 的 taint 应用于 node,运行在 node 上的所有不能容忍这条 taint 的 pod 都会被排挤出 node,能容忍这种 taint 的 pod 则不会被排挤。然而,如果 effect 为 NoExecute 的 toleration 指定给 pod,同时添加可选的 tolerationSeconds 字段,则表示 pod 被排挤出 node 之前,以 taint 的添加时间为起点,允许此 pod 在此 node 上的生存时间。
执行yaml文件测试

kubectl apply -f pod-demo-1.yaml

kubectl get pods

还是显示 pending,因为我们使用的是 equal(等值匹配),所以 key 和 value,effect 必须和node 节点定义的污点完全匹配才可以,把上面配置 effect: “NoExecute”变成 effect: “NoSchedule”,tolerationSeconds: 3600 这行去掉
修改配置文件后观察

kubectl delete -f pod-demo-1.yaml

kubectl apply -f pod-demo-1.yaml

kubectl get pods -o wide

再次修改yaml文件
再次修改 修改如下部分:
tolerations:
- key: “node-type”
operator: “Exists”
value: “”
effect: “NoSchedule”
只要对应的键是存在的,exists,其值被自动定义成通配符

kubectl delete -f pod-demo-1.yaml

kubectl apply -f pod-demo-1.yaml

kubectl get pods -o wide

再次修改
tolerations:
- key: “node-type”
operator: “Exists”
value: “”
effect: “”
有一个 node-type 的键,不管值是什么,不管是什么效果,都能容忍

kubectl delete -f pod-demo-1.yaml

kubectl apply -f pod-demo-1.yaml

kubectl get pod -o wide

九、Pod 高级实战-pod 生命周期和健康检测

9.1 Pod 高级用法:Pod 状态和重启策略

9.1.1 常见的 pod 状态

Pod 的 status 定义在 PodStatus 对象中,其中有一个 phase 字段。它简单描述了 Pod 在其生命周期的阶段。熟悉 Pod 的各种状态对我们理解如何设置 Pod 的调度策略、重启策略是很有必要的。下面是 phase 可能的值,也就是 pod 常见的状态:
挂起(Pending):我们在请求创建 pod 时,条件不满足,调度没有完成,没有任何一个节点能满足调度条件,已经创建了 pod 但是没有适合它运行的节点叫做挂起,调度没有完成,处于 pending 的状态会持续一段时间:包括调度 Pod 的时间和通过网络下载镜像的时间。
运行中(Running):Pod 已经绑定到了一个节点上,Pod 中所有的容器都已被创建。至少有一个容器正在运行,或者正处于启动或重启状态。
成功(Succeeded):Pod 中的所有容器都被成功终止,并且不会再重启。
失败(Failed):Pod 中的所有容器都已终止了,并且至少有一个容器是因为失败终止。也就是说,容器以非 0 状态退出或者被系统终止。
未知(Unknown):未知状态,所谓 pod 是什么状态是 apiserver 和运行在 pod 节点的 kubelet进行通信获取状态信息的,如果节点之上的 kubelet 本身出故障,那么 apiserver 就连不上 kubelet,得不到信息了,就会看 Unknown
扩展:还有其他状态,如下:
Evicted 状态:出现这种情况,多见于系统内存或硬盘资源不足,可 df-h 查看 docker 存储所在目录的资源使用情况,如果百分比大于 85%,就要及时清理下资源,尤其是一些大文件、docker 镜像。
CrashLoopBackOff:容器曾经启动了,但可能又异常退出了
Error 状态:Pod 启动过程中发生了错误
Complete 状态:完成状态


9.1.2 pod 重启策略

Pod 的重启策略(RestartPolicy)应用于 Pod 内的所有容器,并且仅在 Pod 所处的 Node 上由kubelet 进行判断和重启操作。当某个容器异常退出或者健康检查失败时,kubelet 将根据RestartPolicy 的设置来进行相应的操作。
Pod 的重启策略包括 Always、OnFailure 和 Never,默认值为 Always。
Always:当容器失败时,由 kubelet 自动重启该容器。
OnFailure:当容器终止运行且退出码不为 0 时,由 kubelet 自动重启该容器。
Never:不论容器运行状态如何,kubelet 都不会重启该容器。


9.2 Pod 高级用法:Pod 生命周期

9.2.1 init容器

Pod 里面可以有一个或者多个容器,部署应用的容器可以称为主容器,在创建 Pod 时候,Pod 中可以有一个或多个先于主容器启动的 Init 容器,这个 init 容器就可以称为初始化容器,初始化容器一旦执行完,它从启动开始到初始化代码执行完就退出了,它不会一直存在,所以在主容器启动之前执行初始 化,初始化容器可以有多个,多个初始化容器是要串行执行的,先执行初始化容器 1,在执行初始化容器2 等,等初始化容器执行完初始化就退出了,然后再执行主容器,主容器一退出,pod 就结束了,主容器退出的时间点就是 pod 的结束点,它俩时间轴是一致的;
Init 容器就是做初始化工作的容器。可以有一个或多个,如果多个按照定义的顺序依次执行,只有所有的初始化容器执行完后,主容器才启动。由于一个 Pod 里的存储卷是共享的,所以 Init Container 里产生的数据可以被主容器使用到,Init Container 可以在多种 K8S 资源里被使用到,如 Deployment、 DaemonSet, StatefulSet、Job 等,但都是在 Pod 启动时,在主容器启动前执行,做初始化工作。
Init 容器与普通的容器区别是:
1、Init 容器不支持 Readiness,因为它们必须在 Pod 就绪之前运行完成
2、每个 Init 容器必须运行成功,下一个才能够运行
3、如果 Pod 的 Init 容器失败,Kubernetes 会不断地重启该 Pod,直到 Init 容器成功为止,然而,如果 Pod 对应的 restartPolicy 值为 Never,它不会重新启动。


9.2.2 主容器

初始化容器启动之后,开始启动主容器,在主容器启动之后有一个 post start hook(容器启动后钩子)和 pre stop hook(容器结束前钩子),无论启动后还是结束前所做的事我们可以把它放两个钩子,这个钩子就表示用户可以用它来钩住一些命令,来执行它,做开场前的预设,结束前的清理,如 awk 有begin,end,和这个效果类似;
postStart:该钩子在容器被创建后立刻触发,通知容器它已经被创建。如果该钩子对应的 hook handler 执行失败,则该容器会被杀死,并根据该容器的重启策略决定是否要重启该容器,这个钩子不需要传递任何参数。
preStop:该钩子在容器被删除前触发,其所对应的 hook handler 必须在删除该容器的请求发送给 Docker daemon 之前完成。在该钩子对应的 hook handler 完成后不论执行的结果如何,Docker daemon 会发送一个 SGTERN 信号量给 Docker daemon 来删除该容器,这个钩子不需要传递任何参数。
在 k8s 中支持两类对 pod 的检测:
第一类叫做 livenessprobe(pod 存活性探测):存活探针主要作用是,用指定的方式检测 pod 中的容器应用是否正常运行,如果检测失败,则认为容器不健康,那么 Kubelet 将根据 Pod 中设置的 restartPolicy 来判断 Pod 是否要进行重启操作,如果容器配置中没有配置 livenessProbe,Kubelet 将认为存活探针探测一直为成功状态。
第二类是状态检 readinessprobe(pod 就绪性探测):用于判断容器中应用是否启动完成,当探测成功后才使 Pod 对外提供网络访问,设置容器 Ready 状态为 true,如果探测失败,则设置容器的Ready 状态为 false。


9.2.3 创建 pod阶段

当用户创建 pod 时,这个请求给 apiserver,apiserver 把创建请求的状态保存在 etcd 中;接下来 apiserver 会请求 scheduler 来完成调度,如果调度成功,会把调度的结果(如调度到哪个节点上了,运行在哪个节点上了,把它更新到 etcd 的 pod 资源状态中)保存在 etcd 中,一旦存到 etcd 中并且完成更新以后,如调度到 k8s-slave1 上,那么 k8s-slave1 节点上的 kubelet 通过 apiserver 当中的状态变化知道有一些任务被执行了,所以此时此 kubelet 会拿到用户创建时所提交的清单,这个清单会在当前节点上运行或者启动这个 pod,如果创建成功或者失败会有一个当前状态,当前这个状态会发给 apiserver,apiserver 在存到 etcd 中;在这个过程中,etcd 和 apiserver 一直在打交道,不停的交互,scheduler 也参与其中,负责调度 pod 到合适的 node 节点上,这个就是 pod 的创建过程
pod 在整个生命周期中有非常多的用户行为:
1、初始化容器完成初始化
2、主容器启动后可以做启动后钩子
3、主容器结束前可以做结束前钩子
4、在主容器运行中可以做一些健康检测,如 liveness probe,readness probe


9.3 Pod 高级用法:Pod 容器探测深度讲解

9.3.1 容器钩子:postStart 和 preStop

postStart:容器创建成功后,运行前的任务,用于资源部署、环境准备等。
preStop:在容器被终止前的任务,用于优雅关闭应用程序、通知其他系统等。
演示 postStart 和 preStop 用法:

......
containers:
- image: sample:v2 
  name: war
  lifecycle:
   postStart:
     exec:
       command:
       - “cp”
       - “/sample.war”
       - “/app”
   prestop:
     httpGet:
       host: monitor.com
       path: /waring
       port: 8080
       scheme: HTTP
......

以上示例中,定义了一个 Pod,包含一个 JAVA 的 web 应用容器,其中设置了 PostStart 和 PreStop 回调函数。即在容器创建成功后,复制/sample.war 到/app 文件夹中。而在容器终止之前,发送 HTTP 请求到 http://monitor.com:8080/waring,即向监控系统发送警告
优雅的删除资源对象
当用户请求删除含有 pod 的资源对象时(如 RC、deployment 等),K8S 为了让应用程序优雅关闭(即让应用程序完成正在处理的请求后,再关闭软件),K8S 提供两种信息通知:
1) 默认:K8S 通知 node 执行 docker stop 命令,docker 会先向容器中 PID 为 1 的进程发送系统信号 SIGTERM,然后等待容器中的应用程序终止执行,如果等待时间达到设定的超时时间,或者默认超时时间(30s),会继续发送 SIGKILL 的系统信号强行 kill 掉进程。
2) 使用 pod 生命周期(利用 PreStop 回调函数),它执行在发送终止信号之前。默认情况下,所有的删除操作的优雅退出时间都在 30 秒以内。kubectl delete 命令支持 —grace-period=0 的选项,以运行用户来修改默认值。0 表示删除立即执行,并且立即从 API 中删除 pod。在节点上,被设置了立即结束的的 pod,仍然会给一个很短的优雅退出时间段,才会开始被强制杀死。如下:

 spec:
   containers:
   - name: nginx-demo
     image: centos:nginx
     lifecycle:
       preStop:
         exec:
           # nginx -s quit gracefully terminate while SIGTERM triggers a quick exit
           command: ["/usr/local/nginx/sbin/nginx","-s","quit"]
     ports:
     - name: http
       containerPort: 80

9.3.2 探测:livenessProbe 和 readinessProbe

概念
livenessProbe:存活性探测
许多应用程序经过长时间运行,最终过渡到无法运行的状态,除了重启,无法恢复。通常情况下,K8S 会发现应用程序已经终止,然后重启应用程序 pod。有时应用程序可能因为某些原因(后端服务故障等)导致暂时无法对外提供服务,但应用软件没有终止,导致 K8S 无法隔离有故障的 pod,调用者可能会访问到有故障的 pod,导致业务不稳定。K8S 提供livenessProbe 来检测应用程序是否正常运行,并且对相应状况进行相应的补救措施。
readinessProbe:就绪性探测
在没有配置 readinessProbe 的资源对象中,pod 中的容器启动完成后,就认为 pod 中的应用程序可以对外提供服务,该 pod 就会加入相对应的 service,对外提供服务。但有时一些应用程序启动后,需要较长时间的加载才能对外服务,如果这时对外提供服务,执行结果必然无法达到预期效果,影响用户体验。比如使用 tomcat 的应用程序来说,并不是简单地说 tomcat 启动成功就可以对外提供服务的,还需要等待 spring 容器初始化,数据库连接上等等。
目前 LivenessProbe 和 ReadinessProbe 两种探针都支持下面三种探测方法:
1)ExecAction:在容器中执行指定的命令,如果执行成功,退出码为 0 则探测成功。
2)TCPSocketAction:通过容器的 IP 地址和端口号执行 TCP 检 查,如果能够建立 TCP 连 接,则表明容器健康。
3)HTTPGetAction:通过容器的 IP 地址、端口号及路径调用 HTTP Get 方法,如果响应的状态码大于等于 200 且小于 400,则认为容器健康
探针探测结果有以下值:
1)Success:表示通过检测。
2)Failure:表示未通过检测。
3)Unknown:表示检测没有正常进行。
Pod 探针相关的属性
探针(Probe)有许多可选字段,可以用来更加精确的控制 Liveness 和 Readiness 两种探针的行为
initialDelaySeconds: Pod 启动后首次进行检查的等待时间,单位“秒”。
periodSeconds: 检查的间隔时间,默认为 10s,单位“秒”。
timeoutSeconds: 探针执行检测请求后,等待响应的超时时间,默认为 1s,单位“秒”。
successThreshold: 表示探针的成功的阈值,在达到该次数时,表示成功。
failureThreshold: 探测失败的重试次数,重试一定次数后将认为失败,在 readiness 探针中,Pod 会被标记为未就绪
两种探针区别:
ReadinessProbe 和 livenessProbe 可以使用相同探测方式,只是对 Pod 的处置方式不同:
readinessProbe 当检测失败后,将 Pod 的 IP:Port 从对应的 EndPoint 列表中删除。
livenessProbe 当检测失败后,将杀死容器并根据 Pod 的重启策略来决定作出对应的措施。
LivenessProbe 探针使用示例
(1)通过 exec 方式做健康探测

apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
  labels:
    app: liveness
spec:
  containers:
  - name: liveness
    image: busybox
    args:            # 创建测试探针探测的文件
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      initialDelaySeconds: 10    # 延迟检测时间
      periodSeconds: 5       # 检测时间间隔
      exec:
        command:
        - cat
        - /tmp/healthy

容器启动设置执行的命令: /bin/sh -c “touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600”
容器在初始化后,首先创建一个 /tmp/healthy 文件,然后执行睡眠命令,睡眠 30 秒,到时间后执行删除 /tmp/healthy 文件命令。而设置的存活探针检检测方式为执行 shell 命令,用 cat 命令输出 healthy 文件的内容,如果能成功执行这条命令,存活探针就认为探测成功,否则探测失败。在前 30 秒内,由于文件存在,所以存活探针探测时执行 cat /tmp/healthy 命令成功执行。30 秒后healthy 文件被删除,所以执行命令失败,Kubernetes 会根据 Pod 设置的重启策略来判断,是否重启 Pod。
(2)通过 HTTP 方式做健康探测

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
  labels:
    test: liveness
spec:
  containers:
  - name: liveness
    image: mydlqclub/springboot-helloworld:0.0.1
    livenessProbe:
      initialDelaySeconds: 20      # 延迟加载时间
      periodSeconds: 5          # 重试时间间隔
      timeoutSeconds: 10    # 超时时间设置
      httpGet:
        scheme: HTTP
        port: 8081
        path: /actuator/health

上面 Pod 中启动的容器是一个 SpringBoot 应用,其中引用了 Actuator 组件,提供了/actuator/health 健康检查地址,存活探针可以使用HTTPGet 方式向服务发起请求,请求 8081 端口的 /actuator/health 路径来进行存活判断:
任何大于或等于 200 且小于 400 的代码表示探测成功。
任何其他代码表示失败。
如果探测失败,则会杀死 Pod api进行重启操作。
httpGet 探测方式有如下可选的控制字段:
scheme: 用于连接 host 的协议,默认为 HTTP。
host:要连接的主机名,默认为 Pod IP,可以在 http request head 中设置 host 头部。
port:容器上要访问端口号或名称。
path:http 服务器上的访问 URI。
httpHeaders:自定义 HTTP 请求 headers,HTTP 允许重复 headers。
(3)通过 TCP 方式做健康探测

apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcp
  labels:
    app: liveness
spec:
  containers:
  - name: liveness
    image: nginx
    livenessProbe:
      initialDelaySeconds: 15
      periodSeconds: 20
      tcpSocket:
        port: 80

TCP 检查方式和 HTTP 检查方式非常相似,在容器启动 initialDelaySeconds 参数设定的时间后,kubelet 将发送第一个 livenessProbe 探针,尝试连接容器的 80 端口,如果连接失败则将杀死Pod 重启容器。
ReadinessProbe 探针使用示例
Pod 的 ReadinessProbe 探针使用方式和 LivenessProbe 探针探测方法一样,也是支持三种,只是一个是用于探测应用的存活,一个是判断是否对外提供流量的条件。这里用一个 Springboot 项目,设置 ReadinessProbe 探测 SpringBoot 项目的 8081 端口下的 /actuator/health 接口,如果探测成功则代表内部程序以及启动,就开放对外提供接口访问,否则内部应用没有成功启动,暂不对外
提供访问,直到就绪探针探测成功。
(1) exec方法

apiVersion: v1
kind: Service
metadata:
  name: springboot
  labels:
    app: springboot
spec:
  type: NodePort
  ports:
  - name: server
    port: 8080
    targetPort: 8080
    nodePort: 31180
  - name: management
    port: 8081
    targetPort: 8081
    nodePort: 31181
  selector:
    app: sprintboot
---
apiVersion: v1
kind: Pod
metadata:
  name: springboot
  labels:
    app: springboot
spec:
  containers:
  - name: springboot
    image: mydlqclub/springboot-helloworld:0.0.1
    ports:
    - name: server
      containerPort: 8080
    - name: management
      containerPort: 8081
    readinessProbe:
      initialDelaySeconds: 20
      periodSeconds: 5
      timeoutSeconds: 10
      httpGet:
        scheme: HTTP
        port: 8081
        path: /actuator/health

ReadinessProbe + LivenessProbe 配合使用示例
一般程序中需要设置两种探针结合使用,并且也要结合实际情况,来配置初始化检查时间和检测间 隔,下面列一个简单的 SpringBoot 项目的 Deployment 例子。

apiVersion: v1
kind: Service
metadata:
  name: springboot
  labels:
    app: springboot
spec:
  type: NodePort
  ports:
  - name: server
    port: 8080
    targetPort: 8080
    nodePort: 31180
  - name: management
    port: 8081
    targetPort: 8081
    nodePort: 31181
  selector:
    app: springboot
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: springboot
  labels:
    app: springboot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: springboot
  template:
    metadata:
      name: springboot
      labels:
        app: springboot
    spec:
      containers:
      - name: readiness
        image: mydlqclub/springboot-helloworld:0.0.1
        ports:
        - name: server
          containerPort: 8080
        - name: management
          containerPort: 8081
        readinessProbe:
          initialDelaySeconds: 20
          periodSeconds: 5
          timeoutSeconds: 10
          httpGet:
            scheme: HTTP
            port: 8081
            path: /actuator/health
        livenessProbe:
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          httpGet:
            scheme: HTTP
            port: 8081
            path: /actuator/health

十、使用 kubectl 管理 Kubernetes 容器平台

10.1 kubectl 概述

kubectl 是一个用于操作 kubernetes 集群的命令行接口,通过利用 kubectl 的各种命令可以实现各种功能。


10.2 kubectl 创建和删除一个 pod

10.2.1 kubectl run 语法

kubectl run 和 docker run 一样,kubectl run 能将一个 pod 运行起来
语法:
kubectl run podNAME —image=image [—env=”key=value”] [—port=port]
查看 kubectl run 帮助命令

kubectl run --help

例子: 创建一个名字为 nginx 的 pod

kubectl run nginx --image=nginx:latest --image-pull-policy='IfNotPresent' --port=80

—image-pull-policy=’IfNotPresent’ # 如果本地不存在镜像,那么才从外网下载镜像。 默认值为:imagePullPolicy: Always 一直从外网下载镜像,不使用本地的镜像。
pods 常见的状态:
ContainerCreating #容器创建过程
ImagePullBackOff #镜像拉取失败
注:如果 pod 没有正常运行,看到的状态是 ImagePullBackOff,是因为 docker hub 没有连接上,导致镜像没有下载成功,这时,可以在 node 节点上把相关镜像手动上传一下或把 docker 源换成阿里云的。
terminating [‘tɜ:mɪneɪtɪŋ] #终止 。删除 pod 时的状态
Running 正常运行状态
创建一个pod,但是不运行,可以输出到yaml文件里

kubectl run nginx --image=nginx --image-pull-policy='IfNotPresent' --port=80 -o yaml --dry-run > 1.yaml

10.2.2 使用 kubectl delete 删除创建的对象

删除 pod
语法:kubectl delete pod pod 名字

kubectl delete pod nginx

查看pod调度到哪个节点

kubectl get pod

kubectl get pods -n kube-system -o wide

强制删除pod

 kubectl run nginx --image=nginx --image-pull-policy='IfNotPresent' --port=80

 kubectl delete pod nginx --force --grace-period=0

10.3 Yaml 语法规则

YAML 语言(发音 /ˈjæməl/ )的设计目标,就是方便人类读写。它实质上是一种通用的数据串行化格式。


10.3.1 pod 资源 yaml 配置文件常见单词

kind: 同类,类型;
apiVersion API 版本;
metadata 元数据;
spec: 规格,说明书(定义具体参数);


10.3.2 yaml 语法的基本语法规则

1、大小写敏感
2、使用缩进表示层级关系
3、缩进时不允许使用 Tab 键,只允许使用空格。
4、缩进的空格数目不重要,只要相同层级的元素左侧对齐即可
5、# 表示注释,从这个字符一直到行尾,都会被解析器忽略。


10.4 kubectl create 加载 yaml 文件生成 deployment

使用 kubectl run 在设定很复杂的需求时,需要非常长的一条语句,也很容易出错,也没法保存。所以更多场景下会使用 yaml 或者 json 文件


10.4.1 编写yaml文件

编写yaml文件

 vim mysql-deployment.yaml
kind: Deployment          #使用 deployment 创建一个 pod 资源,旧的 k8s 版本可以使用 kind: ReplicationController来创建 pod
apiVersion: apps/v1 
metadata:
  name: mysql           #deployment 的名称,全局唯一
  namespace: default
  labels:
    k8s-app: mysql
spec:
  selector:
    matchLabels:
      k8s-app: mysql
  replicas: 1           # Pod 副本期待数量,1 表示只运行一个 pod
  template:
    metadata:
      labels:
        k8s-app: mysql
    spec:
      containers:          # Pod 中容器的定义部分
      - name: mysql           #容器的名称
        image: mysql/mysql-server       #容器对应的 Docker Image 镜像
        imagePullPolicy: IfNotPresent          #默认值为:imagePullPolicy: Always 一直从外网,下载镜像,不使用本地的镜像。
        #其他镜像下载策略参数说明:
        #IfNotPresent :如果本地存在镜像就优先使用本地镜像。这样可以直接使用本地镜像,加快启动速度。          Present [ˈpreznt] 目前;现在
        #Never:直接使用本地镜像,不再去外网拉取镜像;如果本地不存在就报异常了。
        ports:
        - containerPort: 3306 #容器暴露的端口号
          protocol: TCP
        env:                        #注入到容器的环境变量
        - name: MYSQL_ROOT_PASSWORD           #设置 mysql root 的密码
          value: "hello123

注: mysql-deployment.yaml 文件结构:
image.png


10.4.2 使用 mysql-deployment.yaml 创建和删除 mysql 资源

使用刚才编写的资源

kubectl create -f mysql-deployment.yaml

kubectl delete -f mysql-deployment.yaml

kubectl apply -f mysql-deployment.yaml

扩展:
使用 kubectl create 和 kubectl apply 创建资源对象的区别
kubectl apply -f FILENAME 可多次执行,每次修改 YAML 文件后,使用 apply 可直接部署
kubectl create -f FILENAME 不可多次执行,每次修改 YAML 后,需要先删除资源后才能部署
推荐使用 apply 部署资源对象,delete 删除资源对象。
注意:线上都用 kubectl apply -f file.yaml


10.4.3 使用 get 参数查看 pod 详细信息

使用get查看资源

 kubectl get pod

 kubectl get deployment

查看详细信息

kubectl get pod -o wide

总结:get 命令能够确认的信息类别:
deployments (缩写 deploy)
namespaces (缩写 ns)
nodes (缩写 no)
pods (缩写 po)
replicasets (缩写 rs)
services (缩写 svc)


10.4.4 使用 describe 查看 k8s 中详细信息

describe [dɪˈskraɪb] 描述
语法: kubectl describe pod pod 名字
语法: kubectl describe node node 名字
语法: kubectl describe deployment deployment 名字
使用 describe 查看 pod 的详细描述信息

kubectl describe pod mysql-76f8866f79-j6twz

使用 describe 查看 node 的详细描述信息

kubectl describe node xuegod62

使用 describe 查看 deployment 的详细描述信息

kubectl describe deployment mysql

10.5 kubectl 其他常用命令和参数说明

10.5.1 kubectl logs

使用方法
类似于 docker logs,使用 kubectl logs 能够取出 pod 中镜像的 log,也是故障排除时候的重要信息

kubectl get pods

kubectl logs mysql-76f8866f79-j6twz

kubectl logs -f mysql-76f8866f79-mdd69

#查看 pod 具体容器日志,-c
kubectl logs mysql-76f8866f79-q5cvx -c mysql

10.5.2 kubectl exec

使用方法
exec 命令用于到 pod 中执行一条命令,到 mysql 的镜像中执行 cat /etc/my.cnf 命令

kubectl get pod

kubectl exec mysql-76f8866f79-j6twz cat /etc/my.cnf

使用参数 exec -it ,直接登 pod 上中

kubectl exec -it mysql-76f8866f79-j6twz -- bash

#-c 指定进入到 pod 具体容器里
 kubectl exec -it mysql-76f8866f79-j6twz -c mysql -- bash

10.5.3 kubectl cp

使用方法
用于从 pod 中拷出 hosts 文件到物理机的/tmp 下

 kubectl cp mysql-76f8866f79-j6twz:/etc/hosts /tmp/hosts

command terminated with exit code 126 #报错
排错方法:

kubectl cp --help

Copy files and directories to and from containers.
Examples:
# !!!Important Note!!!
# Requires that the ‘tar’ binary is present in your container #发现想要使用 kubectl cp 你的容器实例中必须有 tar 库
# image. If ‘tar’ is not present, ‘kubectl cp’ will fail. #如果镜像中 tar 命令不存在,那么 kubectl cp 将失败
在镜像中安装tar命令

kubectl exec -it mysql-76f8866f79-j6twz bash

yum install tar -y

10.5.4 kubectl attach

kubectl attach 用于取得 pod 中容器的实时信息,可以直接查看容器中以 daemon 形式运行的进程输出,可以持续不断实时的取出消息。像 tail -f /var/log/messages 动态查看日志的作用。
kubectl logs 是一次取出所有消息,像 cat /etc/passwd

kubectl attach kube-apiserver-xuegod63 -n kube-system

10.6 使用 kubectl 管理集群

kubectl 相关命令参数如下:
kubectl edit 编辑 k8s 资源
kubectl replace 替换,使用 yaml 配置文件来替换正在运行中的配置参数
kubectl patch 部分更新资源相关信息
kubectl apply 使用文件或者标准输入更改配置信息
kubectl scale 重新设定 Deployment/ReplicaSet/RC/Job 的 size
kubectl autoscale Deployment/ReplicaSet/RC 的自动扩展设定
kubectl cordon 设定 node 不可使用
kubectl uncordon 设定 node 可以使用
kubectl drain 设定 node 进入维护模式


10.6.1 创建资源文件

编写yaml文件

 vim /root/nginx-deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx
  namespace: default
  labels: 
    k8s-app: nginx
spec:
  selector:
  matchLabels:
    k8s-app: nginx
  replicas: 2 #创建两个 pod
  template:
    metadata:
      labels:
        k8s-app: nginx
    spec:
   containers:
   - name: nginx
     image: nginx:latest
     imagePullPolicy: IfNotPresent 
     ports:
     - containerPort: 80
       protocol: TCP
vim /root/nginx-svc.yaml
kind: Service
apiVersion: v1
metadata:
  name: nginx
  labels:
    k8s-app: nginx
spec:
  type: NodePort
  ports:
  - protocol: TCP
    nodePort: 31001
    targetPort: 80
    port: 3000
    name: nginx
 selector:
   k8s-app: nginx

注:
nodePort: 31001 #指定是 node 物理机上监控听的端口,后期公网上的机器,可以访问此端口。
targetPort: 80 #nginx pod 的端口号,也就是真正 docker 监听的端口号
port: 3000 #service 服务端口号 、service 集群 IP 监听的端口,K8S 内部各个 service 服务,使用这个端口号进行通信
创建资源对象

kubectl apply -f nginx-deployment.yaml

kubectl apply -f nginx-svc.yaml

10.6.2 查看并创建 deployment 和 serveice

查看资源对象

kubectl get pods -o wide

kubectl get deploy

kubectl get svc

10.6.3 kubectl edit 修改服务配置

kubectl get -o 格式化输出, -o yaml 指定输出的消息为 yaml 类型
输出yaml文件

 kubectl get service nginx -o yaml

更改服务端口

kubectl edit service nginx

改: 46 - nodePort: 31001
为: 46 - nodePort: 31002
验证是否成功

kubectl get service

注:edit 编辑修改配置文件时,不需要停止服务。改完后立即生效


10.6.4 kubectl replace

replace 就是替换,把 nodeport 端口改为 31003
导出yaml文件并更改端口

kubectl get service nginx -o yaml > nginx_replace.yaml

vim nginx_replace.yaml

改:- nodePort: 31002
为:- nodePort: 31003
更改并验证是否成功

kubectl replace -f nginx_replace.yaml

kubectl get service

10.6.5 kubectl patch

当修改一部分设定时,使用 patch 很方便。比如:给 pod 换个 image 镜像。
对镜像进行替换

kubectl patch pod nginx-6698f44dcc-k9sk9 -p '{"spec":{"containers":[{"name":"nginx","image":"richarvey/nginx-php-fpm:latest"}]}}'

到容器内进行验证

kubectl exec nginx-6698f44dcc-k9sk9 -it bash

10.6.6 kubectl apply

apply 命令是用来使用文件或者标准输入来更改配置信息。
修改yaml文件

vim nginx-svc.yaml

改:11 nodePort: 31001
为:11 nodePort: 31004
更改配置信息并验证

kubectl apply -f nginx-svc.yaml

kubectl get svc

10.6.7 kubectl scale

scale 命令用于横向扩展,是 kubernetes 或者 swarm 这类容器编辑平台的重要功能之一
查看pod数量

kubectl get pod -o wide

进行扩容,并验证

kubectl scale --current-replicas=2 --replicas=3 deployment/nginx deployment.apps/nginx scaled

kubectl get deploy

kubectl get pod

kubectl get pod -o wide

10.6.8 kubectl autoscale

autoscale 命令用于自动扩展确认,scale 需要手动执行,而 autoscale 则会根据负载进行调解。而这条命令则可以对 Deployment 进行设定,通过最小值和最大值的指定进行设定。
指定自动扩缩容

kubectl autoscale deployment nginx --min=2 --max=5 horizontalpodautoscaler.autoscaling/nginx autoscaled

删除自动扩缩容

kubectl delete hpa nginx

10.6.9 kubectl cordon和kubectl drain

设置节点是维护状态

kubectl cordon k8s-slave1

节点维护并驱逐 pod

kubectl drain xuegod64 --ignore-daemonsets --force --delete-local-data

接触节点维护

kubectl uncordon xuegod64

注:
在对节点进行维护后,创建pod不会调度到维护node节点,但是如果指定了nodename还是会调度上去
在对节点驱逐pod时,需要加参数—ignore-daemonsets —force —delete-local-data 默认会忽略那些不能杀死的系统类型的 pod
—force 当一些pod不是经 ReplicationController, ReplicaSet, Job, DaemonSet 或者 StatefulSet 管理的时候就需要用—force来强制执行 (例如:kube-proxy)
—ignore-daemonsets 无视DaemonSet管理下的Pod。即—ignore-daemonsets往往需要指定的,这是因为deamonset会忽略unschedulable标签(使用kubectl drain时会自动给节点打上不可调度标签),因此deamonset控制器控制的pod被删除后可能马上又在此节点上启动起来,这样就会成为死循环.因此这里忽略daemonset。
—delete-local-data 如果有mount local volumn的pod,会强制杀掉该pod。


10.6.10 Kubectl 自动补全

安装 bash-completion

yum install -y bash-completion

source /usr/share/bash-completion/bash_completion

应用 kubectl 的 completion 到系统环境

source <(kubectl completion bash)

echo "source <(kubectl completion bash)" >> ~/.bashrc

使用效果

kubectl get

按 tab 可以自动补全


十一、k8s 控制器-Replicaset-Deployment

11.1 Replicaset 控制器:概念、原理解读

11.1.1 Replicaset 概述

ReplicaSet 是 kubernetes 中的一种副本控制器,简称 rs,主要作用是控制由其管理的 pod,使pod 副本的数量始终维持在预设的个数。它的主要作用就是保证一定数量的 Pod 能够在集群中正常运行,它会持续监听这些 Pod 的运行状态,在 Pod 发生故障时重启 pod,pod 数量减少时重新运行新的Pod 副本。官方推荐不要直接使用 ReplicaSet,用 Deployments 取而代之,Deployments 是比ReplicaSet 更高级的概念,它会管理 ReplicaSet 并提供很多其它有用的特性,最重要的是 Deployments 支持声明式更新,声明式更新的好处是不会丢失历史变更。所以 Deployment 控制器不直接管理 Pod 对象,而是由 Deployment 管理 ReplicaSet,再由 ReplicaSet 负责管理 Pod 对象。


11.1.2 Replicaset 工作原理

Replicaset 核心作用在于代用户创建指定数量的 pod 副本,并确保 pod 副本一直处于满足用户期望的数量, 起到多退少补的作用,并且还具有自动扩容缩容等机制。
Replicaset 控制器主要由三个部分组成:
用户期望的 pod 副本数:用来定义由这个控制器管控的 pod 副本有几个
标签选择器:选定哪些 pod 是自己管理的,如果通过标签选择器选到的 pod 副本数量少于我们指定的数量,需要用到下面的字段
pod 资源模板:如果集群中现存的 pod 数量不够我们定义的副本中期望的数量怎么办,需要新建 pod,这就需要 pod 模板,新建的 pod 是基于模板来创建的。


11.2 Replicaset 资源清单文件编写技巧

11.2.1 字段定义

查看定义 Replicaset 资源需要的字段

kubectl explain Replicaset

kubectl explain rs            # Replicaset 资源在命令行可以简写成 rs
KIND:     ReplicaSet
VERSION:  apps/v1

DESCRIPTION:
     ReplicaSet ensures that a specified number of pod replicas are running at
     any given time.

FIELDS:
   apiVersion    <string>                  #当前资源使用的 api 版本,跟 VERSION: apps/v1 保持一致
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind    <string>                       #资源类型,跟 KIND: ReplicaSet 保持一致
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata    <Object>                  #元数据,定义 Replicaset 名字的
     If the Labels of a ReplicaSet are empty, they are defaulted to be the same
     as the Pod(s) that the ReplicaSet manages. Standard object's metadata. More
     info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>                     #定义副本数、定义标签选择器、定义 Pod 模板
     Spec defines the specification of the desired behavior of the ReplicaSet.
     More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

   status    <Object>                   #状态信息,不能改
     Status is the most recently observed status of the ReplicaSet. This data
     may be out of date by some window of time. Populated by the system.
     Read-only. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

查看 replicaset 的 spec 字段

kubectl explain rs.spec
KIND:     ReplicaSet
VERSION:  apps/v1

RESOURCE: spec <Object>

DESCRIPTION:
     Spec defines the specification of the desired behavior of the ReplicaSet.
     More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

     ReplicaSetSpec is the specification of a ReplicaSet.

FIELDS:
   minReadySeconds    <integer>
     Minimum number of seconds for which a newly created pod should be ready
     without any of its container crashing, for it to be considered available.
     Defaults to 0 (pod will be considered available as soon as it is ready)

   replicas    <integer>                    #定义的 pod 副本数,根据我们指定的值创建对应数量的 pod
     Replicas is the number of desired replicas. This is a pointer to
     distinguish between explicit zero and unspecified. Defaults to 1. More
     info:
     https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#what-is-a-replicationcontroller

   selector    <Object> -required-           #用于匹配 pod 的标签选择器
     Selector is a label query over pods that should match the replica count.
     Label keys and values that must match in order to be controlled by this
     replica set. It must match the pod template's labels. More info:
     https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors

   template    <Object>                     #定义 Pod 的模板,基于这个模板定义的所有 pod 是一样的
     Template is the object that describes the pod that will be created if
     insufficient replicas are detected. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template

查看 replicaset 的 spec.template 字段
对于 template 而言,其内部定义的就是 pod,pod 模板是一个独立的对象

kubectl explain rs.spec.template
KIND:     ReplicaSet
VERSION:  apps/v1

RESOURCE: template <Object>

DESCRIPTION:
     Template is the object that describes the pod that will be created if
     insufficient replicas are detected. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template

     PodTemplateSpec describes the data a pod should have when created from a
     template

FIELDS:
   metadata    <Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>
     Specification of the desired behavior of the pod. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

查看 template 下 spec 字段下有哪些字段

kubectl explain rs.spec.template.spec

11.2.2 Replicaset 使用案例

编写资源清单

vim replicaset.yaml
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: frontend
  labels:
    app: guestbook
    tier: frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      tier: frontend
  template:
    metadata:
      labels:
        tier: frontend
    spec:
      containers:
      - name: nginx-rs
        image: nginx
        imagePullPolicy: IfNotPresent
        ports: 
        - name: http 
          containerPort: 80

执行yaml文件并验证

kubectl apply -f replicaset.yaml

kubectl get rs

kubectl get pods

注: pod 的名字是由控制器的名字-随机数组成的
资源清单详细说明

apiVersion: apps/v1                 #ReplicaSet 这个控制器属于的核心群组 
kind: ReplicaSet                    #创建的资源类型
metadata:
  name: frontend                    #控制器的名字
  labels:
    app: guestbook
    tier: frontend
spec:
  replicas: 3                       #管理的 pod 副本数量
  selector:
    matchLabels:
      tier: frontend                #管理带有 tier=frontend 标签的 pod
  template:                         #定义 pod 的模板
    metadata:
      labels:
        tier: frontend
    spec:                           #pod 标签,一定要有,这样上面控制器就能找到它要管理的 pod 是哪些了
      containers:
      - name: nginx-rs              #定义 pod 里运行的容器
        image: nginx                #定义容器的名字
        imagePullPolicy: IfNotPresent
        ports:                      #定义端口
        - name: http                #定义容器的名字
          containerPort: 80         #定义容器暴露的端口

11.2.3 Replicaset 管理 pod-扩容、缩容、更新

Replicaset 实现 pod 的动态扩容
kubectl get pods -l tier=frontend -w #动态查看 pod
ReplicaSet 最核心的功能是可以动态扩容和回缩,如果我们觉得三个副本太少了,想要增加,只需要修改配置文件 replicaset.yaml 里的 replicas 的值即可,原来 replicas: 3,现在变成 replicaset: 4,修改之后,执行如下命令更新

kubectl apply -f replicaset.yaml

kubectl get rs

kubectl get pods -l tier=frontend

也可以直接编辑控制器实现扩容

kubectl edit rs frontend      #这个是我们把请求提交给了 apiserver,实时修改

kubectl edit rs frontend

kubectl get rs

kubectl get pods

Replicaset 实现 pod 的动态缩容
如果我们觉得 5 个 Pod 副本太多了,想要减少,只需要修改配置文件 replicaset.yaml 里的 replicas 的值即可,把 replicaset:4 变成 replicas: 2,修改之后,执行如下命令更新:

kubectl apply -f replicaset.yaml

kubectl get rs

kubectl get pods -l tier=frontend

Replicaset 实现 pod 的更新

kubectl edit rs frontend

修改镜像 image: nginx 变成- image: http

kubectl get rs -o wide

可以看到镜像变成了 http,说明滚动升级成功了

kubectl get pods -o wide

curl 10.244.209.133

可以看到虽然镜像已经更新了,但是原来的 pod 使用的还是之前的镜像,新创建的 pod 才会 使用最新的镜像
生产环境如果升级,可以删除一个 pod,观察一段时间之后没问题再删除另一个 pod,但是这样需要人工干预多次;实际生产环境一般采用蓝绿发布,原来有一个 rs1,再创建一个 rs2(控制器),通过修改 service 标签,修改 service 可以匹配到 rs2 的控制器,这样才是蓝绿发布,这个也需要我们精心的部署规划,我们有一个控制器就是建立在 rs 之上完成的,叫做 Deployment

11.3 Deployment 控制器:概念、原理解读

11.3.1 Deployment 概述

Deployment 是 kubernetes 中最常用的资源对象,为 ReplicaSet 和 Pod 的创建提供了一种声明式的定义方法,在 Deployment 对象中描述一个期望的状态,Deployment 控制器就会按照一定的控制速率把实际状态改成期望状态,通过定义一个 Deployment 控制器会创建一个新的 ReplicaSet 控制器,通过 ReplicaSet 创建 pod,删除 Deployment 控制器,也会删除 Deployment 控制器下对应的ReplicaSet 控制器和 pod 资源
使用 Deployment 而不直接创建 ReplicaSet 是因为 Deployment 对象拥有许多 ReplicaSet 没有的特性,例如滚动升级和回滚。
扩展:声明式定义是指直接修改资源清单 yaml 文件,然后通过 kubectl apply -f 资源清单 yaml文件,就可以更改资源
Deployment 控制器是建立在 rs 之上的一个控制器,可以管理多个 rs,每次更新镜像版本,都会生成一个新的 rs,把旧的 rs 替换掉,多个 rs 同时存在,但是只有一个 rs 运行
image.png
rs v1 控制三个 pod,删除一个 pod,在 rs v2 上重新建立一个,依次类推,直到全部都是由 rs v2控制,如果 rs v2 有问题,还可以回滚,Deployment 是建构在 rs 之上的,多个 rs 组成一个 Deployment,但是只有一个 rs 处于活跃状态.


11.3.2 Deployment 工作原理

Deployment
Deployment 可以使用声明式定义,直接在命令行通过纯命令的方式完成对应资源版本的内容的修改,也就是通过打补丁的方式进行修改;Deployment 能提供滚动式自定义自控制的更新;对Deployment 来讲,我们在实现更新时还可以实现控制更新节奏和更新逻辑。
更新节奏和更新逻辑
比如说 Deployment 控制 5 个 pod 副本,pod 的期望值是 5 个,但是升级的时候需要额外多几个pod,那我们控制器可以控制在 5 个 pod 副本之外还能再增加几个 pod 副本;比方说能多一个,但是不能少,那么升级的时候就是先增加一个,再删除一个,增加一个删除一个,始终保持 pod 副本数是 5个;还有一种情况,最多允许多一个,最少允许少一个,也就是最多 6 个,最少 4 个,第一次加一个,删除两个,第二次加两个,删除两个,依次类推,可以自己控制更新方式,这种滚动更新需要加readinessProbe 和 livenessProbe 探测,确保 pod 中容器里的应用都正常启动了才删除之前的 pod。 启动第一步,刚更新第一批就暂停了也可以;假如目标是 5 个,允许一个也不能少,允许最多可以10 个,那一次加 5 个即可;这就是我们可以自己控制节奏来控制更新的方法。
通过 Deployment 对象,你可以轻松的做到以下事情:
1) 创建 ReplicaSet 和 Pod
2) 滚动升级(不停止旧服务的状态下升级)和回滚应用(将应用回滚到之前的版本)
3) 平滑地扩容和缩容
4) 暂停和继续 Deployment


11.4 Deployment 资源清单文件编写技巧

11.4.1 资源清单解读

查看 Deployment 资源对象由哪几部分组成

kubectl explain deployment
KIND:     Deployment
VERSION:  apps/v1

DESCRIPTION:
     Deployment enables declarative updates for Pods and ReplicaSets.

FIELDS:
   apiVersion    <string>                    #该资源使用的 api 版本
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind    <string>                          #创建的资源
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata    <Object>                      #元数据,包括资源的名字和名称空间
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>                          #定义容器的
     Specification of the desired behavior of the Deployment.

   status    <Object>                        #状态,不可以修改
     Most recently observed status of the Deployment.

查看 Deployment 下的 spec 字段

kubectl explain deployment.spec
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: spec <Object>

DESCRIPTION:
     Specification of the desired behavior of the Deployment.

     DeploymentSpec is the specification of the desired behavior of the
     Deployment.

FIELDS:
   minReadySeconds    <integer>             #Kubernetes 在等待设置的时间后才进行升级
   #如果没有设置该值,Kubernetes 会假设该容器启动起来后就提供服务了
     Minimum number of seconds for which a newly created pod should be ready
     without any of its container crashing, for it to be considered available.
     Defaults to 0 (pod will be considered available as soon as it is ready)

   paused    <boolean>                       #暂停,当我们更新的时候创建 pod 先暂停,不是立即更新
     Indicates that the deployment is paused.

   progressDeadlineSeconds    <integer>
     The maximum time in seconds for a deployment to make progress before it is
     considered to be failed. The deployment controller will continue to process
     failed deployments and a condition with a ProgressDeadlineExceeded reason
     will be surfaced in the deployment status. Note that progress will not be
     estimated during the time a deployment is paused. Defaults to 600s.
     # k8s 在升级过程中有可能由于各种原因升级卡住(这个时候还没有明确的升级失败),比如
     # 在拉取被墙的镜像,权限不够等错误。那么这个时候就需要有个 deadline ,在 deadline 
     # 之内如果还卡着,那么就上报这个情况,这个时候这个 Deployment 状态就被标记为
     # False,并且注明原因。但是它并不会阻止 Deployment 继续进行卡住后面的操作。完全由
     # 用户进行控制。

   replicas    <integer>                         #副本数
     Number of desired pods. This is a pointer to distinguish between explicit
     zero and not specified. Defaults to 1.

   revisionHistoryLimit    <integer>            #保留的历史版本,默认是 10
     The number of old ReplicaSets to retain to allow rollback. This is a
     pointer to distinguish between explicit zero and not specified. Defaults to
     10.

   selector    <Object> -required-              #标签选择器,选择它关联的pod
     Label selector for pods. Existing ReplicaSets whose pods are selected by
     this will be the ones affected by this deployment. It must match the pod
     template's labels.

   strategy    <Object>                         #更新策略
     The deployment strategy to use to replace existing pods with new ones.

   template    <Object> -required-              #定义的 pod 模板
     Template describes the pods that will be created.

查看 Deployment 下的 spec.strategy 字段

 kubectl explain deploy.spec.strategy
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: strategy <Object>

DESCRIPTION:
     The deployment strategy to use to replace existing pods with new ones.

     DeploymentStrategy describes how to replace existing pods with new ones.

FIELDS:
   rollingUpdate    <Object>
     Rolling update config params. Present only if DeploymentStrategyType =
     RollingUpdate.

   type    <string>
     Type of deployment. Can be "Recreate" or "RollingUpdate". Default is
     RollingUpdate.

支持两种更新,Recreate 和 RollingUpdate
#Recreate 是重建式更新,删除一个更新一个
#RollingUpdate 滚动更新,定义滚动更新方式,也就是 pod 能多几个,少几个
查看 Deployment 下的 spec.strategy.rollingUpdate 字段

 kubectl explain deploy.spec.strategy.rollingUpdate
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: rollingUpdate <Object>

DESCRIPTION:
     Rolling update config params. Present only if DeploymentStrategyType =
     RollingUpdate.

     Spec to control the desired behavior of rolling update.

FIELDS:
   maxSurge    <string>
     The maximum number of pods that can be scheduled above the desired number
     of pods. Value can be an absolute number (ex: 5) or a percentage of desired
     pods (ex: 10%). This can not be 0 if MaxUnavailable is 0. Absolute number
     is calculated from percentage by rounding up. Defaults to 25%. Example:
     when this is set to 30%, the new ReplicaSet can be scaled up immediately
     when the rolling update starts, such that the total number of old and new
     pods do not exceed 130% of desired pods. Once old pods have been killed,
     new ReplicaSet can be scaled up further, ensuring that total number of pods
     running at any time during the update is at most 130% of desired pods.
     #我们更新的过程当中最多允许超出的指定的目标副本数有几个;
     #它有两种取值方式,第一种直接给定数量,第二种根据百分比,百分比表示原本是 5 个,最多可以超出 20%,那就允许多一个,
     #最多可以超过40%,那就允许多两个



   maxUnavailable    <string>
     The maximum number of pods that can be unavailable during the update. Value
     can be an absolute number (ex: 5) or a percentage of desired pods (ex:
     10%). Absolute number is calculated from percentage by rounding down. This
     can not be 0 if MaxSurge is 0. Defaults to 25%. Example: when this is set
     to 30%, the old ReplicaSet can be scaled down to 70% of desired pods
     immediately when the rolling update starts. Once new pods are ready, old
     ReplicaSet can be scaled down further, followed by scaling up the new
     ReplicaSet, ensuring that the total number of pods available at all times
     during the update is at least 70% of desired pods.
     #最多允许几个不可用,假设有 5 个副本,最多一个不可用,就表示最少有 4 个可用

查看 Deployment 下的 spec.template 字段
#template 为定义 Pod 的模板,Deployment 通过模板创建 Pod

kubectl explain deploy.spec.template
KIND:     Deployment
VERSION:  apps/v1

RESOURCE: template <Object>

DESCRIPTION:
     Template describes the pods that will be created.

     PodTemplateSpec describes the data a pod should have when created from a
     template

FIELDS:
   metadata    <Object>               #定义模板的名字
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>
     Specification of the desired behavior of the pod. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

deployment.spec.template 为 Pod 定义的模板,和 Pod 定义不太一样,template 中不包含 apiVersion 和 Kind 属性,要求必须有metadata。
deployment.spec.template.spec 为容器的属性信息,其他定义内容和 Pod 一致。
查看 Deployment 下的 spec.template.spec 字段

kubectl explain deploy.spec.template.spec
KIND: Deployment
VERSION: apps/v1
RESOURCE: spec <Object>
DESCRIPTION:
  Specification of the desired behavior of the pod. More info:
  https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
  PodSpec is a description of a pod.
FIELDS:
  activeDeadlineSeconds <integer>
  #activeDeadlineSeconds 表示 Pod 可以运行的最长时间,达到设置的该值后,Pod 会自动停止。
  affinity <Object>                       #定义亲和性,跟直接创建 pod 时候定义亲和性类似
  automountServiceAccountToken <boolean>              #身份认证相关的
  containers <[]Object> -required-                #定义容器属性
  dnsConfig<Object>              #设置 Pod 的 DNS

  dnsConfig:
    nameservers:
    - 192.xxx.xxx.6
    searches:
    - xuegod.svc.cluster.local
    - my.dns.search.xuegod
  dnsPolicy <string> # dnsPolicy 决定 Pod 内预设的 DNS 配置策略

  None 无任何策略:使用自定义的策略
  Default 默认:使用宿主机的 dns 配置,/etc/resolv.conf
  ClusterFirst 集群 DNS 优先,与 Default 相反,会预先使用 kube-dns (或 CoreDNS )的信息当预设置参数写入到该 Pod 内的 DNS 配置。
  ClusterFirstWithHostNet 集群 DNS 优先,并伴随着使用宿主机网络:同时使用
  hostNetwork 与 kube-dns 作为 Pod 预设 DNS 配置。

  enableServiceLinks <boolean>
  ephemeralContainers <[]Object>    
  # 定义临时容器临时容器与其他容器的不同之处在于,它们缺少对资源或执行的保证,并且永远不会自
  # 动重启,因此不适用于构建应用程序。临时容器使用与常规容器相同的 ContainerSpec 段
  # 进行描述,但许多字段是不相容且不允许的。
  # 临时容器没有端口配置,因此像 ports,livenessProbe,readinessProbe 这样的字段是不允许的。
  # Pod 资源分配是不可变的,因此 resources 配置是不允许的。
  # 临时容器用途:当由于容器崩溃或容器镜像不包含调试应用程序而导致 kubectl exec 无用时,临时容器对于交互式故障排查很有用。

  hostAliases <[]Object> #在 pod 中增加域名解析的
  hostAliases:
  – ip: "10.1.2.2"
    hostnames:
    – "mc.local"
    – "rabbitmq.local"
  – ip: "10.1.2.3"
    hostnames:
    – "redis.local"
    – "mq.local"

  hostIPC <boolean>              #使用主机 IPC
  hostNetwork <boolean>           #是否使用宿主机的网络
  hostPID <boolean>               #可以设置容器里是否可以看到宿主机上的进程。True 可以
  hostname <string>
  imagePullSecrets <[]Object>
  initContainers <[]Object>               #定义初始化容器
  nodeName <string>                     #定义 pod 调度到具体哪个节点上
  nodeSelector <map[string]string>             #定义节点选择器
  overhead <map[string]string> 
  # overhead 是 1.16 引入的字段,在没有引入Overhead 之前,只要一个节点的资源可用量大于等于 Pod 的 requests 时,这个 Pod  
  # 就可以被调度到这个节点上。引入 Overhead 之后,只有节点的资源可用量大于等于Overhead 加上 requests 的和时才能被调度上来

  preemptionPolicy <string>
  priority <integer>
  priorityClassName <string>
  readinessGates <[]Object>
  restartPolicy <string>                   #Pod 重启策略
  runtimeClassName <string>
  schedulerName <string>
  securityContext <Object>            #是否开启特权模式
  serviceAccount <string>
  serviceAccountName <string>
  setHostnameAsFQDN <boolean>
  shareProcessNamespace <boolean>
  subdomain <string>
  terminationGracePeriodSeconds <integer>   #在真正删除容器之前,K8S 会先发终止信号(kill -15 {pid})给容器,默认 30s
  tolerations <[]Object> #定义容忍度
  topologySpreadConstraints <[]Object
  volumes <[]Object> #挂载存储卷

11.4.2 Deployment 使用案例

编写yaml资源

kubectl create ns blue-green

vim deploy-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-v1
  namespace: blue-green
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
      version: v1
  template:
    metadata:
      labels:
        app: myapp
        version: v1
    spec:
      containers:
      - name: myapp
        image: nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80

更新清单文件并查看

kubectl apply -f deploy-demo.yaml

kubectl get deploy -n blue-green

NAME :列出名称空间中 deployment 的名称。
READY:显示 deployment 有多少副本数。它遵循 ready/desired 的模式。
UP-TO-DATE: 显示已更新到所需状态的副本数。
AVAILABLE: 显示你的可以使用多少个应用程序副本。
AGE :显示应用程序已运行的时间。

kubectl get rs -n blue-green

创建 deploy 的时候也会创建一个 rs(replicaset),67fd9fc9c8 这个随机数字是我们引用 pod的模板 template 的名字的 hash 值
NAME: 列出名称空间中 ReplicaSet 资源
DESIRED:显示应用程序的所需副本数,这些副本数是在创建时定义的。这是所需的状态。
CURRENT: 显示当前正在运行多少个副本。
READY: 显示你的用户可以使用多少个应用程序副本。
AGE :显示应用程序已运行的时间。
资源清单文件解读

apiVersion: apps/v1               # deployment 对应的 api 版本
kind: Deployment                  # 创建的资源是 deployment
metadata:
  name: myapp-v1                  # deployment 的名字
  namespace: blue-green
spec:
  replicas: 2                     # deployment 管理的 pod 副本数
  selector:                       # 标签选择器
    matchLabels:                  # matchLabels 下定义的标签需要跟 template.metadata.labels 定义的标签一致
      app: myapp                 
      version: v1
  template: 
    metadata:
      labels:
        app: myapp
        version: v1
    spec:                         # 定义容器的属性
      containers:
      - name: myapp
        image: nginx              # 容器使用的镜像
        imagePullPolicy: IfNotPresent            # 镜像拉取策略
        ports:
        - containerPort: 80       # 容器里的应用的端口

11.4.3 Deployment 管理 pod-扩容、缩容、滚动更新、回滚

通过 deployment 管理应用,实现扩容,把副本数变成 3

cat deploy-demo.yaml

直接修改 replicas 数量,如下,变成 3
spec:
replicas: 3
修改之后保存退出,执行

kubectl apply -f deploy-demo.yaml
# 注意:apply 不同于 create,apply 可以执行多次;create 执行一次,再执行就会报错复。

kubectl get pods -n blue-green

可以看到 pod 副本数变成了 3 个
#查看 myapp-v1 这个控制器的详细信息

kubectl describe deploy myapp-v1 -n blue-green

通过 deployment 管理应用,实现缩容,把副本数变成 2

cat deploy-demo.yaml

直接修改 replicas 数量,如下,变成 2
spec:
replicas: 2
修改之后保存退出,执行

kubectl apply -f deploy-demo.yaml

 kubectl get pods -n blue-green

通过 deployment 管理应用,实现滚动更新
在一个终端窗口执行如下:

kubectl get pods -l app=myapp -n blue-green -w

打开一个新的终端窗口更改镜像版本,按如下操作:

 vim deploy-demo.yaml

把 image: nginx 变成 image: tomcat

 kubectl apply -f deploy-demo.yaml

再回到刚才执行监测 kubectl get pods -l app=myapp -n blue-green -w 的那个窗口,进行观察
pending 表示正在进行调度,ContainerCreating 表示正在创建一个 pod,running 表示运行一个 pod,running 起来一个 pod 之后再Terminating(停掉)一个 pod,以此类推,直到所有 pod 完成滚动升级
在另外一个窗口执行

kubectl get rs -n blue-green

上面可以看到 rs 有两个,上面那个是升级之前的,已经被停掉,但是可以随时回滚
#查看 myapp-v1 这个控制器的历史版本

kubectl rollout history deployment myapp-v1 -n blue-green

kubectl rollout undo deployment myapp-v1 --to-revision=1 -n blue-green

kubectl rollout history deployment myapp-v1 -n blue-green

11.4.4 Deployment 资源清单详解

资源清单详解

apiVersion: apps/v1
kind: Deployment 
metadata:
  name: portal
  namespace: ms 
spec:
  replicas: 1
  selector:
    matchLabels:
      project: ms
      app: portal
  template:
    metadata:
      labels:
        project: ms 
        app: portal
    spec:
      containers:
      - name: nginx
        image: nginx
        imagePullPolicy: Always
        ports:
        - protocol: TCP
          containerPort: 80
        resources:          #资源配额
          limits:           #资源限制,最多可用的 cpu 和内存
            cpu: 1
            memory: 1Gi
          requests:        #最少需要多少资源才可以运行 Pod
            cpu: 0.5
            memory: 1Gi
        readinessProbe:
          tcpSocket:
            port: 80
          initialDelaySeconds: 60
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: 80
          initialDelaySeconds: 60
          periodSeconds: 10

livenessProbe:
#存活性探测
#用于判断容器是否存活,即 Pod 是否为 running 状态,如果 LivenessProbe 探针探测到容器不 健康,则 kubelet 将 kill 掉容器,并根据容器的重启策略是否重启。如果一个容器不包含LivenessProbe 探针,则 Kubelet 认为容器的 LivenessProbe 探针的返回值永远成功。
tcpSocket:
port: 8080 #检测 8080 端口是否存在
initialDelaySeconds: 60 #Pod 启动 60s 执行第一次检查
periodSeconds: 10 #第一次检查后每隔 10s 检查一次

readinessProbe:
#就绪性探测
有时候应用程序可能暂时无法接受请求,比如 Pod 已经 Running 了,但是容器内应用程序尚未启动成功,在这种情况下,如果没有 ReadinessProbe,则 Kubernetes 认为它可以处理请求了,然而此时,我们知道程序还没启动成功是不能接收用户请求的,所以不希望 kubernetes 把请求调度给它,则使用 ReadinessProbe 探针。

ReadinessProbe 和 livenessProbe 可以使用相同探测方式,只是对 Pod 的处置方式不同,ReadinessProbe 是将 Pod IP:Port 从对应的 EndPoint 列表中删除,而 livenessProbe 则 Kill 容器并根据 Pod 的重启策略来决定作出对应的措施。

ReadinessProbe 探针探测容器是否已准备就绪,如果未准备就绪则 kubernetes 不会将流量转发给此 Pod。
tcpSocket:
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
#在 Pod 运行过程中,K8S 仍然会每隔 10s 检测 8080 端口


11.5 Kubectl 常用命令补充

11.5.1 Kubectl cordon

cordon [ˈkɔ:dn] 封锁 警戒线
在实际维护的时候会出现某个 node 坏掉,或者做一些处理,暂时不能让生成的 pod 在此 node 上运行,需要通知 kubernetes 让其不要创建过来,这条命令就是 cordon,uncordon 则是取消这个要求。
之前横向扩展到 3 个副本,发现利用 roundrobin 策略,每个 node 上运行起来了一个 pod,是平均分配的。
执行 cordon 命令
设定 k8s-slave1 上不再运行新的 pod 实例,使用 get node 确认,其状态显示SchedulingDisabled。

kubectl cordon k8s-slave1

kubectl get nodes -o wide

执行 scale 命令,再次执行横向扩展命令,看是否会有 pod 漂到 k8s-slave1 这台机器上,结果发现只有之前的一个 pod,再没有新的 pod 漂过去。

kubectl scale --replicas=6 deployment/nginx

kubectl get pod -o wide

注:发现新扩展的 pod 都在 k8s-slave2 运行了,没有在 k8s-slave1 上运行,说明封锁 k8s-slave1 成功了。另外,我们指定扩展到 6 个 pod,但是只运行了 5 个,因为前面执行 autoscale 时,指定最多可以扩展到 5 个。
执行 uncordon 解除封锁命令
使用 uncordon 命令解除对 xuegod64 机器的限制,通过 get node 确认状态也已经正常。

kubectl uncordon k8s-slave1

kubectl get nodes

执行 scale 命令
再次执行 scale 命令,发现有新的 pod 可以创建到 k8s-slave1上了

kubectl scale --replicas=7 deployment nginx

11.5.2 Kubectl drain

kubectl drain 命令 [dreɪn] 排水
用于对某个 node 结点进行维护。
drain 两个作用:
1) 设定此 node 不可以使用(cordon)
2) evict 驱逐 pod 到他正常的 node 节点上
将 nginx 的副本设定为 4,发现 xuegod64 和 xuegod62 各启动了两个 pod

kubectl create -f /root/nginx-deployment.yaml

kubectl scale --replicas=4 deployment nginx

kubectl get pod -o wide

执行 drain 命令,让 k8s-slave1 不可用,并把 xuegod62 上已经运行 pod 驱逐到其他 node 上

 kubectl drain k8s-slave1 --ignore-daemonsets --delete-emptydir-data

注:—ignore-daemonsets —忽略守护进程
结果确认
把 k8s-slave1 上的 pod 删除后,k8s 会根据 replicas 的机制,在清退 pod 和设定 node 时,在其他结点上生成新的 pod。这时就可以对 xuegod62 节点进行维护了,维护好后,重新 uncordon 即可。

kubectl get pod -o wide

kubectl get nodes -o wide

kubectl uncordon xuegod62

十二、k8s 控制器-Daemonset-Job

12.1 DaemonSet 控制器:概念、原理解读

12.1.1 DaemonSet 概述

DaemonSet 控制器能够确保 k8s 集群所有的节点都运行一个相同的 pod 副本,当向 k8s 集群中增加 node 节点时,这个 node 节点也会自动创建一个 pod 副本,当 node 节点从集群移除,这些 pod 也会自动删除;删除 Daemonset 也会删除它们创建的 pod


12.1.2 DaemonSet 工作原理

Daemonset 的控制器会监听 kuberntes 的 daemonset 对象、pod 对象、node 对象,这些被监听的对象之变动,就会触发 syncLoop 循环让 kubernetes 集群朝着 daemonset 对象描述的状态进行演进。


12.1.3 Daemonset 典型的应用场景

在集群的每个节点上运行存储,比如:glusterd 或 ceph。
在每个节点上运行日志收集组件,比如:flunentd 、 logstash、filebeat 等。
在每个节点上运行监控组件,比如:Prometheus、 Node Exporter 、collectd 等。


12.1.4 DaemonSet 与 Deployment 的区别

Deployment 部署的副本 Pod 会分布在各个 Node 上,每个 Node 都可能运行好几个副本。
DaemonSet 的不同之处在于:每个 Node 上最多只能运行一个副本。


12.2 DaemonSet 资源清单文件编写技巧

查看定义 Daemonset 资源需要的字段

kubectl explain ds
KIND:     DaemonSet
VERSION:  apps/v1

DESCRIPTION:
     DaemonSet represents the configuration of a daemon set.

FIELDS:
   apiVersion    <string>     #当前资源使用的 api 版本,跟 VERSION: apps/v1 保持一致
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind    <string>           #资源类型,跟 KIND: DaemonSet 保持一致
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata    <Object>       #元数据,定义 DaemonSet 名字的
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>           #定义容器的
     The desired behavior of this daemon set. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

   status    <Object>         #状态信息,不能改
     The current status of this daemon set. This data may be out of date by some
     window of time. Populated by the system. Read-only. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

查看 DaemonSet 的 spec 字段

kubectl explain ds.spec
KIND:     DaemonSet
VERSION:  apps/v1

RESOURCE: spec <Object>

DESCRIPTION:
     The desired behavior of this daemon set. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

     DaemonSetSpec is the specification of a daemon set.

FIELDS:
   minReadySeconds    <integer>           #当新的 pod 启动几秒种后,再 kill 掉旧的 pod。
     The minimum number of seconds for which a newly created DaemonSet pod
     should be ready without any of its container crashing, for it to be
     considered available. Defaults to 0 (pod will be considered available as
     soon as it is ready).

   revisionHistoryLimit    <integer>       #历史版本
     The number of old history to retain to allow rollback. This is a pointer to
     distinguish between explicit zero and not specified. Defaults to 10.

   selector    <Object> -required-         #用于匹配 pod 的标签选择器
     A label query over pods that are managed by the daemon set. Must match in
     order to be controlled. It must match the pod template's labels. More info:
     https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors

   template    <Object> -required-        #定义 Pod 的模板,基于这个模板定义的所有 pod 是一样的
     An object that describes the pod that will be created. The DaemonSet will
     create exactly one copy of this pod on every node that matches the
     template's node selector (or on every node if no node selector is
     specified). More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template

   updateStrategy    <Object>            #daemonset 的升级策略
     An update strategy to replace existing DaemonSet pods with new pods.

查看 DaemonSet 的 spec.template 字段
对于 template 而言,其内部定义的就是 pod,pod 模板是一个独立的对象

kubectl explain ds.spec.template
KIND:     DaemonSet
VERSION:  apps/v1

RESOURCE: template <Object>

DESCRIPTION:
     An object that describes the pod that will be created. The DaemonSet will
     create exactly one copy of this pod on every node that matches the
     template's node selector (or on every node if no node selector is
     specified). More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#pod-template

     PodTemplateSpec describes the data a pod should have when created from a
     template

FIELDS:
   metadata    <Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>
     Specification of the desired behavior of the pod. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

12.3 DaemonSet 使用案例-日志收集组件 fluentd

将镜像上传到子节点并解压

docker load -i fluentd-2-5-1.tar.gz

编写一个DaemonSet资源清单

vim daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd-elasticsearch
        image: xuegod/fluentd:v2.5.1
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

执行yaml文件并查看

kubectl apply -f daemonset.yaml

kubectl get ds -n kube-system

kubectl get pods -n kube-system -o wide

通过上面可以看到在 k8s 的三个节点均创建了 fluentd 这个 pod
#pod 的名字是由控制器的名字-随机数组成的
资源清单详细说明

apiVersion: apps/v1                    # DaemonSet 使用的 api 版本
kind: DaemonSet                        # 资源类型
metadata:
  name: fluentd-elasticsearch          # 资源的名字
  namespace: kube-system               # 资源所在的名称空间
  labels:
    k8s-app: fluentd-logging           # 资源具有的标签
spec:
  selector:                            # 标签选择器
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:                          # 基于这回模板定义的 pod 具有的标签
        name: fluentd-elasticsearch    
    spec:
      tolerations:                     # 定义容忍度
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:                      # 定义容器
      - name: fluentd-elasticsearch
        image: xuegod/fluentd:v2.5.1
        imagePullPolicy: IfNotPresent
        resources:                     # 资源配额
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log         # 把本地/var/log 目录挂载到容器
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers  # 把/var/lib/docker/containers/挂载到容器里
          readOnly: true              # 挂载目录是只读权限
      terminationGracePeriodSeconds: 30  # 优雅的关闭服务
      volumes:
      - name: varlog
        hostPath:
          path: /var/log              # 基于本地目录创建一个卷
      - name: varlibdockercontainers
        hostPath:   
          path: /var/lib/docker/containers      # 基于本地目录创建一个卷

扩展:什么是 terminationGracePeriodSeconds?
解释这个参数之前,先来回忆一下 K8S 滚动升级的步骤:
1.K8S 首先启动新的 POD
2.K8S 等待新的 POD 进入 Ready 状态
3.K8S 创建 Endpoint,将新的 POD 纳入负载均衡
4.K8S 移除与老 POD 相关的 Endpoint,并且将老 POD 状态设置为 Terminating,此时将不会有新的请求到达老 POD
5.同时 K8S 会给老 POD 发送 SIGTERM 信号,并且等待 terminationGracePeriodSeconds 这么长的时间。(默认为 30 秒)
6.超过 terminationGracePeriodSeconds 等待时间后, K8S 会强制结束老 POD看到这里,我想大家应该明白了,terminationGracePeriodSeconds 就是 K8S 给你程序留的最后的缓冲时间,来处理关闭之前的操作。


12.4 Daemonset 管理 pod-滚动更新

  1. 查看 daemonset 的滚动更新策略
    kubectl explain ds.spec.updateStrategy
    
    ``` KIND: DaemonSet VERSION: apps/v1

RESOURCE: updateStrategy

DESCRIPTION: An update strategy to replace existing DaemonSet pods with new pods.

 DaemonSetUpdateStrategy is a struct used to control the update strategy for
 a DaemonSet.

FIELDS: rollingUpdate Rolling update config params. Present only if type = “RollingUpdate”.

type Type of daemon set update. Can be “RollingUpdate” or “OnDelete”. Default is RollingUpdate.

查看 rollingUpdate 支持的更新策略
```shell
kubectl explain ds.spec.updateStrategy.rollingUpdate
KIND:     DaemonSet
VERSION:  apps/v1

RESOURCE: rollingUpdate <Object>

DESCRIPTION:
     Rolling update config params. Present only if type = "RollingUpdate".

     Spec to control the desired behavior of daemon set rolling update.

FIELDS:
   maxSurge    <string>
     The maximum number of nodes with an existing available DaemonSet pod that
     can have an updated DaemonSet pod during during an update. Value can be an
     absolute number (ex: 5) or a percentage of desired pods (ex: 10%). This can
     not be 0 if MaxUnavailable is 0. Absolute number is calculated from
     percentage by rounding up to a minimum of 1. Default value is 0. Example:
     when this is set to 30%, at most 30% of the total number of nodes that
     should be running the daemon pod (i.e. status.desiredNumberScheduled) can
     have their a new pod created before the old pod is marked as deleted. The
     update starts by launching new pods on 30% of nodes. Once an updated pod is
     available (Ready for at least minReadySeconds) the old DaemonSet pod on
     that node is marked deleted. If the old pod becomes unavailable for any
     reason (Ready transitions to false, is evicted, or is drained) an updated
     pod is immediatedly created on that node without considering surge limits.
     Allowing surge implies the possibility that the resources consumed by the
     daemonset on any given node can double if the readiness check fails, and so
     resource intensive daemonsets should take into account that they may cause
     evictions during disruption. This is beta field and enabled/disabled by
     DaemonSetUpdateSurge feature gate.

   maxUnavailable    <string>
     The maximum number of DaemonSet pods that can be unavailable during the
     update. Value can be an absolute number (ex: 5) or a percentage of total
     number of DaemonSet pods at the start of the update (ex: 10%). Absolute
     number is calculated from percentage by rounding up. This cannot be 0 if
     MaxSurge is 0 Default value is 1. Example: when this is set to 30%, at most
     30% of the total number of nodes that should be running the daemon pod
     (i.e. status.desiredNumberScheduled) can have their pods stopped for an
     update at any given time. The update starts by stopping at most 30% of
     those DaemonSet pods and then brings up new DaemonSet pods in their place.
     Once the new pods are available, it then proceeds onto other DaemonSet
     pods, thus ensuring that at least 70% of original number of DaemonSet pods
     are available at all times during the update.

上面表示 rollingUpdate 更新策略支持 maxUnavailabe 和 maxSugre,默认是先删除在更新。
可以自己调节更新策略,如下:
updateStrategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
更新镜像版本,可以按照如下方法

kubectl set image daemonsets fluentd-elasticsearch *=ikubernetes/filebeat:5.6.6-alpine -n kube-system

12.5 Job 和 CronJob 控制器:概念、原理解读

12.5.1 Job 概念、原理解读

Job 控制器用于管理 Pod 对象运行一次性任务,比方说我们对数据库备份,可以直接在 k8s 上启动一个 mysqldump 备份程序,也可以启动一个 pod,这个 pod 专门用来备份用的,备份结束 pod 就可以终止了,不需要重启,而是将 Pod 对象置于”Completed”(完成)状态,若容器中的进程因错误而终止,则需要按照重启策略配置确定是否重启,对于 Job 这个类型的控制器来说,需不需要重建 pod 就看任务是否完成,完成就不需要重建,没有完成就需要重建 pod。 Job 控制器的 Pod 对象的状态转换如下图所示:
image.png
Job 用来创建 1 个或多个 Pod,并保证指定数量(.spec.completions)的 Pod 成功完成。当一个 Pod成功完成时(.status.phase=Succeeded),Job 会记录已完成的 Pod 的数量,等完成的数量达到指定值时,这个 Job 就完成了。可以通过以下 3 种方式来判断一个 Job 是否已完成:
.status.completionTime 是否为空。Job 完成时该字段会被设置成 Job 完成的时间,否则为空
.spec.completions 和 .status.succeeded 是否相等,即对比期望完成数和已成功数,当二者相等时,表示 Job 已经完成
.status.conditions[0].type:type 为 Complete 和 Failed 时,分别表示 Job 执行成功和失败

Pod 中的容器可能因为各种各样的原因失败,比如退出码不为 0、超出内存限制被 kill 掉,容器失败分两种情况:
.spec.template.spec.restartPolicy = “OnFailure”:容器失败后会不断重启,直到成功(退出码为 0)
.spec.template.spec.restartPolicy = “Never”:容器不会重启,Pod 的状态转为 Failed 当 Pod 执行失败时,Job 会不断创建一个新的 Pod 进行重试,直到失败次数达到.spec.backoffLimit 指定的数值,整个 Job 的执行失败。可以通过判断.status.failed和.spec.backoffLimit 是否相等,即已失败数是否已经达到上限,来判断 Job 是否已经执行失败。如下,当.spec.backoffLimit 设置为 3 时,.status.failed 已经达到 3,Job 失败,不会再尝试创建新的

Pod
Job 三种使用场景:
非并行任务:只启一个 pod,pod 成功,job 正常结束
并行任务同时指定成功个数:.spec.completions 为指定成功个数,可以指定也可以不指定.spec.parallelism(指定>1,会有多个任务并行运行)。当成功个数达到.spec.completions,任务结束。
有工作队列的并行任务:.spec.completions 默认为 1,.spec.parallelism 为大于 0 的整数。此时并行启动多个 pod,只要有一个成功,任务结束,所有 pod 结束
适用场景:
Job 不是设计用来完成通信密集型的并行程序,如科学计算领域常见的场景。它支持并行地处理一组独立但相关的 work item,如发送邮件,渲染帧,转码文件和扫描 NoSql 数据库中的 key
相关配置:
.spec.completions:完成该 Job 需要执行成功的 Pod 数
.spec.parallelism:能够同时运行的 Pod 数
.spec.backoffLimit:允许执行失败的 Pod 数,默认值是 6,0 表示不允许 Pod 执行失败。如果
Pod 是 restartPolicy 为 Nerver,则失败后会创建新的 Pod,如果是 OnFailed,则会重启 Pod,不管是哪种情况,只要 Pod 失败一次就计算一次,而不是等整个 Pod 失败后再计算一个。当失败的次数达到该限制时,整个 Job 随即结束,所有正在运行中的 Pod 都会被删除。
.spec.activeDeadlineSeconds: Job 的超时时间,一旦一个 Job 运行的时间超出该限制,则 Job 失败,所有运行中的 Pod 会被结束并删除。该配置指定的值必须是个正整数。不指定则不会超时


12.5.2 CronJob 概念、原理解读

CronJob 跟 Job 完成的工作是一样的,只不过 CronJob 添加了定时任务能力可以指定时间,实现周期性运行。Job,CronJob 和 Deployment,DaemonSet 显著区别在于不需要持续在后台运行
Deployment 主要用于管理无状态的应用(kubernetes 集群有一些 pod,某一个 pod 出现故障,删除之后会重新启动一个 pod,那么 kubernetes 这个集群中 pod 数量就正常了,更多关注的是群体,这就是无状态应用)。
使用场景:
在给定时间点只运行一次。
在给定时间点周期性地运行。
CronJob 的典型用法如下:
在给定的时间点调度 Job 运行。
创建周期性运行的 Job,例如数据库备份、发送邮件


12.6 Job 控制器:资源清单编写技巧

查看 Job 资源对象

kubectl explain job
KIND:     Job
VERSION:  batch/v1

DESCRIPTION:
     Job represents the configuration of a single job.

FIELDS:
   apiVersion    <string>                  #当前 Job 的 api 版本
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind    <string>                       #指定当前的资源类型
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata    <Object>                   #元数据,定义资源的名字和所在名称空间
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>                       
     Specification of the desired behavior of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

   status    <Object>
     Current status of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

查看 Job 下的 spec 字段

kubectl explain job.spec
KIND:     Job
VERSION:  batch/v1

RESOURCE: spec <Object>

DESCRIPTION:
     Specification of the desired behavior of a job. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

     JobSpec describes how the job execution will look like.

FIELDS:
   activeDeadlineSeconds    <integer>
   # 通过指定 job 存活时间,来结束一个 job。当 job 运行时间达到 activeDeadlineSeconds 指定的时间后,job 会停止由它启动的所有任务(如:pod),并设置 job 的状态为 failed
   backoffLimit <integer> 
   #job 建议指定 pod 的重启策略为 never,如:.spec.template.spec.restartPolicy = "Never",然后通过 job 的 backoffLimit 来指定失败重试次数,在达到 backoffLimit 指定的次数后,job 状态设置为 failed(默认为 6 次) 
   completions <integer> 
   #指定 job 启动的任务(如:pod)成功运行 completions 次,job 才算成功结束
 manualSelector <boolean>
 parallelism <integer> 
 #指定 job 同时运行的任务(如:pod)个数,Parallelism 默认为 1,如果设置为 0,则 job 会暂定
 selector <Object>
 template <Object> -required-
 ttlSecondsAfterFinished <integer> 
 #默认情况下,job 异常或者成功结束后,包括 job 启动的任务(pod),都不会被清理掉,因为你可以依据保存的 job 和 pod,查看状态、日志,以及调试等。这些用户可以手动删除,用户手动删除 job,job controller 会级联删除对应的 pod,除了手动删除,通过指定参数 ttlSecondsAfterFinished 也可以实现自动删除 job,以及级联的资源,如:pod。如果设置为 0,job 会被立即删除。如果不指定,job 则不会被删除

查看 Job 下的 spec.template 字段
template 为定义 Pod 的模板,Job 通过模板创建 Pod

kubectl explain job.spec.template
KIND:     Job
VERSION:  batch/v1

RESOURCE: template <Object>

DESCRIPTION:
     Describes the pod that will be created when executing a job. More info:
     https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

     PodTemplateSpec describes the data a pod should have when created from a
     template

FIELDS:
   metadata    <Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>
     Specification of the desired behavior of the pod. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

查看 Job 下的 spec.template.spec 字段

kubectl explain Job.spec.template.spec
KIND:     Job
VERSION:  batch/v1

RESOURCE: spec <Object>

DESCRIPTION:
Specification of the desired behavior of the pod. More info:
https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

PodSpec is a description of a pod.

FIELDS:
  activeDeadlineSeconds    <integer>
  affinity <Object>
  containers <[]Object> -required-
  dnsConfig <Object>
  dnsPolicy <string>
  enableServiceLinks <boolean>
  ephemeralContainers <[]Object>
  hostAliases<[]Object>
  hostIPC <boolean>
  hostNetwork <boolean>
  hostPID <boolean>
  hostname <string>
  imagePullSecrets <[]Object>
  initContainers <[]Object>
  nodeName <string>
  nodeSelector <map[string]string>
  overhead <map[string]string>
  preemptionPolicy <string>
  priority <integer>
  priorityClassName <string>
  readinessGates <[]Object>
  restartPolicy <string> #重启策略,对于 Job,只能设置为 Never 或者 OnFailure。对于其他 controller(比如 Deployment)可以设置为 Always 。
  runtimeClassName <string>
  schedulerName <string>
  securityContext <Object>
  serviceAccount <string>
  serviceAccountName <string>
  setHostnameAsFQDN <boolean>
  shareProcessNamespace <boolean>
  subdomain <string>
  terminationGracePeriodSeconds <integer>
  tolerations <[]Object>
  topologySpreadConstraints <[]Object>
  volumes <[]Object>

12.7 Job 使用案例-创建一个一次性任务

把镜像传递到服务器并解压

docker load -i busybox.tar.gz

编写yaml文件

vim job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: my-busybox-job
spec:
  completions: 6
  parallelism: 3
  backoffLimit: 6
  template:
   spec:
     restartPolicy: Never
     containers:
     - name: my-container-job
       image: busybox
       imagePullPolicy: IfNotPresent
       command: ['sh', '-c']
       args: ['echo "Welcome to xuegod";sleep 60; echo "Next to Meet you"']

更新资源清单并验证

kubectl apply -f job.yaml

kubectl get pods

kubectl logs my-busybox-job-q2d9t

资源清单详解

apiVersion: batch/v1
kind: Job
metadata:
  name: my-busybox-job
spec:
  completions: 6        # job 结束需要成功运行的 Pod 个数,即状态为 Completed 的 pod 数
  parallelism: 3        # 一次运行 3 个 pod
  backoffLimit: 6       # 如果 job 失败,则重试几次
  #这里 parallelism 的值指的是一次性运行几个 pod,这个值不会超过 completions 的值
  template:
   spec:
     restartPolicy: Never
     containers:
     - name: my-container-job
       image: busybox
       imagePullPolicy: IfNotPresent
       command: ['sh', '-c']
       args: ['echo "Welcome to xuegod";sleep 60; echo "Next to Meet you"']

12.8 CronJob 使用案例-创建周期性的定时任务

编写yaml文件

vim cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

创建并验证

kubectl apply -f cronjob.yaml

kubectl get cronjob

kubectl get jobs --watch

kubectl get cronjob

kubectl get pods | grep hello

kubectl logs hello-1617289620-7f45h

12.9 使用 CronJob 定期备份 MySQL 数据

cronjob
CronJob 所描述的,正是定时任务。
1) 在给定时间点只运行一次
2) 在给定时间点周期性地运行
yaml文件编写
一个 CronJob 对象类似于 crontab (cron table)文件中的一行。它根据指定的预定计划周期性地运行一个 Job。在这里简单的说一下 cron,是指 unix 中 cron 表达式。比如:”/1 “,这个Cron 表达式里/1 中 *表示从 0 开始,/表示”每”,1 表示偏移量,所以它的意思是:从 0 开始,每 1 个时间单位执行一次。Cron 表达式中五个部分分别代表:分钟,小时,日,月,星期。所以上述这句Cron 表达式的意思是:从当前开始,每分钟执行一次。那么我们可以利用这个机制来指定创建 mysql备份任务的对象:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: mysqldump
spec:
  schedule: "50 15 * * *"
  jobTemplate:
    spec:
      completions: 1
      template:
        spec:
          restartPolicy: Never
          volumes:
          - name: mysql-master-script
            hostPath:
              path: /root/app/mysql/shell
          - name: mysql-master-backup
            hostPath:
              path: /root/app/db/backup
          - name: local-time
            hostPath:
              path: /etc/localtime
          containers:
          - name: mysqldump-container
            image: nacos/nacos-mysql-master:latest
            volumeMounts:
            - name: mysql-master-script
              mountPath: /var/db/script
            - name: local-time
              mountPath: /etc/localtime
            - name: mysql-master-backup
              mountPath: /var/db/backup
            command:
            - "sh"
            - "/var/db/script/mysqldump.sh"

在这个 Yaml 文件当中,最重要的关键词就是 jobTemplate。它是由 job 对象控制的 Controller,还
有几点关键的属性这里解释说明一下:
1) spec.concurrencyPolicy 这个属性主要是由于定时任务的特殊性,很可能某个 job 还没执行完,另外一个新的 job 就产生了。它的取值分别为:Allow(job 可以同时存在),Forbid(不会创建新的job,该周期会被跳过),Replace(新产生的 Job 替换旧的,没有执行完的 Job)
2) 如果某一次 Job 创建失败,这次创建会被标记为”Miss”。当在指定的事件窗口内,miss 数目达到 100 时,那么 CronJob 会停止再创建这个 Job,这个时间窗口可以有spec.startingDeadlineSeconds 来指定
3) 在 Job 对象中,负责并行控制的参数有两个:spec.parallelism 它定义的是一个 Job 在任意时间最多可以启动多少个 Pod 同时运行。spec.comletions 它定义的是 job 至少完成的 Pod 数目这里容器与宿主机时差相差 8 小时。注意在设置定时任务的时候一定算好时间
备份数据库的脚本

#!/bin/bash
 #保存备份个数
 number=3
 #备份保存路径
 backup_dir=/var/db/backup
 #日期
 dd=`date +%Y%m%d`
 #备份工具
 tool=/usr/bin/mysqldump
 #用户名
 username=root
 #密码
 password=root
 #将要备份的数据库
 database_name=test
 $tool -u $username -p$password -h mysql-master -P3306 --databases 
 $database_name > $backup_dir/$database_name-$dd.sql

 #写创建备份日志
 echo "create $backup_dir/$database_name-$dd.sql" >> $backup_dir/log.txt
 #找出需要删除的备份
 delfile=`ls -l -crt $backup_dir/*.sql | awk '{print $9 }' | head -1`
 #判断现在的备份数量是否大于$number
 count=`ls -l -crt $backup_dir/*.sql | awk '{print $9 }' | wc -l`
 if [ $count -gt $number ]
 then
   rm $delfile //删除最早生成的备份,只保留 number 数量的备份
   #写删除文件日志
   echo "delete $delfile" >> $backup_dir/log.txt
 fi

十三、 kubernetes 四层代理 Service-使用技巧和使用场景

13.1 四层负载均衡 Service:概念、原理解读

13.1.1 为什么要有 Service?

在 kubernetes 中,Pod 是有生命周期的,如果 Pod 重启它的 IP 很有可能会发生变化。如果我们的服务都是将 Pod 的 IP 地址写死,Pod 挂掉或者重启,和刚才重启的 pod 相关联的其他服务将会找不到它所关联的 Pod,为了解决这个问题,在 kubernetes 中定义了 service 资源对象,Service 定义了一个服务访问的入口,客户端通过这个入口即可访问服务背后的应用集群实例,service 是一组 Pod 的逻辑集合,这一组 Pod 能够被 Service 访问到,通常是通过 Label Selector 实现的。
image.png
pod ip 经常变化,service 是 pod 的代理,我们客户端访问,只需要访问 service,就会把请求代理到 Pod
pod ip 在 k8s 集群之外无法访问,所以需要创建 service,这个 service 可以在 k8s 集群外访问的。


13.1.2 Service 概述

service 是一个固定接入层,客户端可以通过访问 service 的 ip 和端口访问到 service 关联的后端 pod,这个 service 工作依赖于在 kubernetes 集群之上部署的一个附件,就是 kubernetes 的 dns 服务(不同 kubernetes 版本的 dns 默认使用的也是不一样的,1.11 之前的版本使用的是 kubeDNs,较新的版本使用的是 coredns),service 的名称解析是依赖于 dns 附件的,因此在部署完 k8s 之后需要再部署 dns 附件,kubernetes 要想给客户端提供网络功能,需要依赖第三方的网络插件(flannel,calico 等)。每个 K8s 节点上都有一个组件叫做 kube-proxy,kube-proxy 这个组件将始终监视着apiserver 中有关 service 资源的变动信息,需要跟 master 之上的 apiserver 交互,随时连接到apiserver 上获取任何一个与 service 资源相关的资源变动状态,这种是通过 kubernetes 中固有的一种请求方法 watch(监视)来实现的,一旦有 service 资源的内容发生变动(如创建,删除),kube-proxy 都会将它转化成当前节点之上的能够实现 service 资源调度,把我们请求调度到后端特定的 pod 资源之上的规则,这个规则可能是 iptables,也可能是 ipvs,取决于 service 的实现方式。


13.1.3 Service 工作原理

k8s 在创建 Service 时,会根据标签选择器 selector(lable selector)来查找 Pod,据此创建与Service 同名的 endpoint 对象,当 Pod 地址发生变化时,endpoint 也会随之发生变化,service 接收前端 client 请求的时候,就会通过 endpoint,找到转发到哪个 Pod 进行访问的地址。(至于转发到哪个节点的 Pod,由负载均衡 kube-proxy 决定)


13.1.4 Kubernetes 集群中有三类 IP 地址

Node Network(节点网络):物理节点或者虚拟节点的网络,如 ens33 接口上的网路地址

ip addr

2: ens33: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:6f:a0:9f brd ff:ff:ff:ff:ff:ff
inet 192.168.1.63/24 brd 192.168.1.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
Pod network(pod 网络),创建的 Pod 具有的 IP 地址

kubectl get pods -o wide -n kube-system

[root@k8s-master1 ~]# kubectl get pods -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6d8c4cb4d-9hl4k 1/1 Running 0 19d 10.244.224.2 k8s-master2
Node Network 和 Pod network 这两种网络地址是我们实实在在配置的,其中节点网络地址是配置在节点接口之上,而 pod 网络地址是配置在 pod 资源之上的,因此这些地址都是配置在某些设备之上的,这些设备可能是硬件,也可能是软件模拟的
Cluster Network(集群地址,也称为 service network),这个地址是虚拟的地址(virtual ip),没有配置在某个接口上,只是出现在 service 的规则当中。

kubectl get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
kubernetes ClusterIP 10.96.0.1 443/TCP


13.2 创建 Service 资源

13.2.1 字段解读

查看定义service资源需要的字段有哪里

kubectl explain service
KIND:     Service
VERSION:  v1

DESCRIPTION:
     Service is a named abstraction of software service (for example, mysql)
     consisting of local port (for example 3306) that the proxy listens on, and
     the selector that determines which pods will answer requests sent through
     the proxy.

FIELDS:
   apiVersion    <string>                  #service 资源使用的 api 组
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind    <string>                         #创建的资源类型
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata    <Object>                      #定义元数据
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>
     Spec defines the behavior of a service.
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

   status    <Object>
     Most recently observed status of the service. Populated by the system.
     Read-only. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

查看service 的 spec 字段

KIND:     Service
VERSION:  v1

RESOURCE: spec <Object>

DESCRIPTION:
     Spec defines the behavior of a service.
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

     ServiceSpec describes the attributes that a user creates on a service.

FIELDS:
   allocateLoadBalancerNodePorts    <boolean>
   clusterIP <string>                    #动态分配的地址,也可以自己在创建的时候指定,创建之后就改不了了
   clusterIPs <[]string>
   externalIPs <[]string>
   externalName<string>
   externalTrafficPolicy <string>
   healthCheckNodePort <integer>
   ipFamilies <[]string>
   ipFamilyPolicy <string>
   loadBalancerIP <string>
   loadBalancerSourceRanges <[]string>
   ports <[]Object>                      #定义 service 端口,用来和后端 pod 建立联系
   publishNotReadyAddresses <boolean>
   selector <map[string]string>          #通过标签选择器选择关联的 pod 有哪些
   sessionAffinity <string>
   sessionAffinityConfig <Object>      #service 在实现负载均衡的时候还支持 sessionAffinity,sessionAffinity
   # 会话联系,默认是 none,随机调度的(基于 iptables 规则调度的);如果我们定义sessionAffinity 的 client ip,那就表示把来自同一客户端的 IP 请求调度到同一个 pod 上
   topologyKeys <[]string>
   type <string>                       #定义 service 的类型

查看定义 Service.spec.type 需要的字段

kubectl explain service.spec.type
KIND:     Service
VERSION:  v1

FIELD:    type <string>

DESCRIPTION:
     type determines how the Service is exposed. Defaults to ClusterIP. Valid
     options are ExternalName, ClusterIP, NodePort, and LoadBalancer.
     "ClusterIP" allocates a cluster-internal IP address for load-balancing to
     endpoints. Endpoints are determined by the selector or if that is not
     specified, by manual construction of an Endpoints object or EndpointSlice
     objects. If clusterIP is "None", no virtual IP is allocated and the
     endpoints are published as a set of endpoints rather than a virtual IP.
     "NodePort" builds on ClusterIP and allocates a port on every node which
     routes to the same endpoints as the clusterIP. "LoadBalancer" builds on
     NodePort and creates an external load-balancer (if supported in the current
     cloud) which routes to the same endpoints as the clusterIP. "ExternalName"
     aliases this service to the specified externalName. Several other fields do
     not apply to ExternalName services. More info:
     https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types

13.2.2 service类型

ExternalName:
适用于 k8s 集群内部容器访问外部资源,它没有 selector,也没有定义任何的端口和 Endpoint。以下 Service 定义的是将 prod 名称空间中的 my-service 服务映射到 my.database.example.com

kind: Service
apiVersion: v1
metadata:
  name: my-service
  namespace: prod
spec:
  type: ExternalName
  externalName: my.database.example.com

当查询主机 my-service.prod.svc.cluster.local 时,群集 DNS 将返回值为 my.database.example.com 的 CNAME 记录。
ClusterIP:
通过 k8s 集群内部 IP 暴露服务,选择该值,服务只能够在集群内部访问,这也是默认的 ServiceType。
NodePort:
通过每个 Node 节点上的 IP 和静态端口暴露 k8s 集群内部的服务。通过请求
:可以把请求代理到内部的 pod。Client——->NodeIP:NodePort——->Service Ip:ServicePort——->PodIP:ContainerPort。
4、LoadBalancer:
使用云提供商的负载局衡器,可以向外部暴露服务。外部的负载均衡器可以路由到 NodePort 服务和 ClusterIP 服务。


13.2.3 service的端口

service的spec.ports字段如何定义

kubectl explain service.spec.ports
KIND:     Service
VERSION:  v1

RESOURCE: ports <[]Object>

DESCRIPTION:
     The list of ports that are exposed by this service. More info:
     https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies

     ServicePort contains information on service's port.

FIELDS:
   appProtocol    <string>
     The application protocol for this port. This field follows standard
     Kubernetes label syntax. Un-prefixed names are reserved for IANA standard
     service names (as per RFC-6335 and
     http://www.iana.org/assignments/service-names). Non-standard protocols
     should use prefixed names such as mycompany.com/my-custom-protocol.

   name    <string>             #定义端口的名字
     The name of this port within the service. This must be a DNS_LABEL. All
     ports within a ServiceSpec must have unique names. When considering the
     endpoints for a Service, this must match the 'name' field in the
     EndpointPort. Optional if only one ServicePort is defined on this service.

   nodePort    <integer>
     The port on each node on which this service is exposed when type is
     NodePort or LoadBalancer. Usually assigned by the system. If a value is
     specified, in-range, and not in use it will be used, otherwise the
     operation will fail. If not specified, a port will be allocated if this
     Service requires one. If this field is specified when creating a Service
     which does not need it, creation will fail. This field will be wiped when
     updating a Service to no longer need it (e.g. changing type from NodePort
     to ClusterIP). More info:
     https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
     #宿主机上映射的端口,比如一个 Web 应用需要被 k8s 集群之外的其他用户访问,那么需要配置type=NodePort,若配置 nodePort=30001,那么其他机器就可以通过浏览器访问 scheme://k8s 集群中的任何一个节点 ip:30001 即可访问到该服务,例如 http://192.168.1.63:30001。如果在 k8s 中部署 MySQL 数据库,MySQL 可能不需要被外界访问,只需被内部服务访问,那么就不需要设置NodePort

   port    <integer> -required-      #service 的端口,这个是 k8s 集群内部服务可访问的端口
     The port that will be exposed by this service.

   protocol    <string>
     The IP protocol for this port. Supports "TCP", "UDP", and "SCTP". Default
     is TCP.

   targetPort    <string>
     Number or name of the port to access on the pods targeted by the service.
     Number must be in the range 1 to 65535. Name must be an IANA_SVC_NAME. If
     this is a string, it will be looked up as a named port in the target Pod's
     container ports. If this is not specified, the value of the 'port' field is
     used (an identity map). This field is ignored for services with
     clusterIP=None, and should be omitted or set equal to the 'port' field.
     More info:
     https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service
     # targetPort 是 pod 上的端口,从 port 和 nodePort 上来的流量,经过 kube-proxy 流入到后端pod 的 targetPort 上,最后进入容器。与制作容器时暴露的端口一致(使用 DockerFile 中的EXPOSE),例如官方的 nginx 暴露 80 端口。

13.3 创建 Service 服务 type 类型 ClusterIP

创建deploy,编写资源文件

vim pod_test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  selector:
    matchLabels:
      run: my-nginx
  replicas: 2
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80    #pod 中的容器需要暴露的端口

更新资源清单并测试

kubectl apply -f pod_test.yaml

kubectl get pod -l run=my-nginx -o wide

curl 10.244.0.65

curl 10.244.92.1

创建service来代理pod

vim service_test.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    run: my-nginx
spec:
  type: ClusterIP
  ports:
  - port: 80   #service 的端口,暴露给 k8s 集群内部服务访问
    protocol: TCP
    targetPort: 80     #pod 容器中定义的端口
  selector:
    run: my-nginx      #选择拥有 run=my-nginx 标签的 pod

上述 yaml 文件将创建一个 Service,具有标签 run=my-nginx 的 Pod,目标 TCP 端口 80,并且在一个抽象的 Service 端口(targetPort:容器接收流量的端口;port:抽象的 Service 端口,可以使任何其它 Pod 访问该 Service 的端口)上暴露。
创建service访问测试

kubectl apply -f service_test.yaml

kubectl get svc -l run=my-nginx

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-nginx ClusterIP 10.10.196.228 80/TCP 4m3s

curl 10.10.196.228:80

通过上面可以看到请求 service IP:port 跟直接访问 pod ip:port 看到的结果一样,这就说明 service 可以把请求代理到它所关联的后端 pod
注意:上面的 10.99.198.177:80 地址只能是在 k8s 集群内部可以访问,在外部无法访问,比方说我们想要通过浏览器访问,那么是访问不通的,如果想要在 k8s 集群之外访问,是需要把 service type 类型改成 nodePort 的
查看service详细信息

 kubectl describe svc my-nginx

Name: my-nginx
Namespace: default
Labels: run=my-nginx
Annotations:
Selector: run=my-nginx
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.10.196.228
IPs: 10.10.196.228
Port: 80/TCP
TargetPort: 80/TCP
Endpoints: 10.244.0.65:80,10.244.92.1:80
Session Affinity: None
Events:

kubectl get ep my-nginx

NAME ENDPOINTS AGE
my-nginx 10.244.0.65:80,10.244.92.1:80 6m45s
service 可以对外提供统一固定的 ip 地址,并将请求重定向至集群中的 pod。其中“将请求重定向至集群中的 pod”就是通过 endpoint 与 selector 协同工作实现。selector 是用于选择 pod,由selector 选择出来的 pod 的 ip 地址和端口号,将会被记录在 endpoint 中。endpoint 便记录了所有pod 的 ip 地址和端口号。当一个请求访问到 service 的 ip 地址时,就会从 endpoint 中选择出一个 ip 地址和端口号,然后将请求重定向至 pod 中。具体把请求代理到哪个 pod,需要的就是 kube-proxy 的轮询实现的。service 不会直接到 pod,service 是直接到 endpoint 资源,就是地址加端口,再由 endpoint 再关联到 pod。
serivce域名解析
service 只要创建完成,我们就可以直接解析它的服务名,每一个服务创建完成后都会在集群 dns 中 动态添加一个资源记录,添加完成后我们就可以解析了,资源记录格式是:SVC_NAME.NS_NAME.DOMAIN.LTD.
服务名.命名空间.域名后缀集群默认的域名后缀是 svc.cluster.local. 就像我们上面创建的 my-nginx 这个服务,它的完整名称解析就是 my-nginx.default.svc.cluster.local

kubectl exec -it my-nginx-5684588fff-bwrkc -- /bin/bash

# 容器中执行
apt-get update

curl my-nginx.default.svc.cluster.local

13.4 创建 Service 服务 type 类型 NodePort

创建deploy资源

vim pod_nodeport.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx-nodeport
spec:
  selector:
    matchLabels:
      run: my-nginx-nodeport
  replicas: 2
  template:
    metadata:
      labels:
        run: my-nginx-nodeport
    spec:
      containers:
      - name: my-nginx-nodeport-container
        image: nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80

执行yaml文件并测试

kubectl apply -f pod_nodeport.yaml

kubectl get pods -l run=my-nginx-nodeport

创建service, 代理pod

vim service_nodeport.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-nginx-nodeport
  labels:
    run: my-nginx-nodeport
spec:
  selector:
    run: my-nginx-nodeport
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
    nodePort: 30018
  selector:
    run: my-nginx-nodeport

更新service资源并测试

 kubectl apply -f service_nodeport.yaml

 kubectl get svc -l run=my-nginx-nodeport

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
my-nginx-nodeport NodePort 10.10.130.52 80:30018/TCP 2m55s run=my-nginx-nodeport

curl 10.10.130.52:80

# 访问集群外测试
curl 192.168.1.63:30018

服务请求走向:
Client-→node ip:30380->service ip:80-→pod ip:container port
Client ->192.168.1.63:30380->10.100.156.7:80->pod ip:80
端口范围
备注:默认端口号范围是 30000-32767
修改端口需要在apiserver配置文件中增加配置: - —service-node-port-range=30000-39999


13.5 创建 Service-type 类型 ExternalName

应用场景:跨名称空间访问
需求:default 名称空间下的 client 服务想要访问 nginx-ns 名称空间下的 nginx-svc 服务
编写client服务yaml文件

vim client.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: client
spec: 
  replicas: 1
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      containers:
      - name: busybox
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ["/bin/sh","-c","sleep 36000"]
kubectl apply -f client.yaml

编写client的svc

vim client_svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: client-svc
spec:
  type: ExternalName
  externalName: nginx-svc.nginx-ns.svc.cluster.local
  ports:
  - name: http
    port: 80
    targetPort: 80

该文件中指定了到 nginx-svc 的软链,让使用者感觉就好像调用自己命名空间的服务一样。
#查看 pod 是否正常运行

kubectl get pod

kubectl apply -f client_svc.yaml

编写server服务yaml文件

kubectl create ns nginx-ns

vim server_nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: nginx-ns
spec: 
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        imagePullPolicy: IfNotPresent
kubectl apply -f server_nginx.yaml

kubectl get pod -n nginx-ns

编写server的svc

vim nginx_svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: nginx-svc
  namespace: nginx-ns
spec:
  selector:
    app: nginx
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 80
kubectl apply -f nginx_svc.yaml

测试

# 登录到client的pod
kubectl exec -it client-76b6556d97-xk7mg -- /bin/sh 

# 容器内执行
wget -q -O - client-svc.default.svc.cluster.local

wget -q -O - nginx-svc.nginx-ns.svc.cluster.local

13.6 映射外部服务案

  1. 在其他机器上安装mysql ```shell yum -y install mariadb-server

systemctl start mariadb


2. 编写service:
```shell
vim mysql_service.yaml
apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  type: ClusterIP
  ports:
  - port: 3306
  1. 执行yaml文件并验证 ```shell kubectl apply -f mysql_service.yaml

kubectl get svc | grep mysql

kubectl describe svc mysql

#  发现没有创建endpoint

4. 创建endpoint
```shell
vim mysql_endpoint.yaml
apiVersion: v1
kind: Endpoints
metadata:
  name: mysql
subsets:
- addresses:
  - ip: 192.168.1.62
  ports:
  - port: 3306
  1. 执行yaml文件并验证 ```shell kubectl apply -f mysql_endpoint.yaml

kubectl describe svc mysql

上面配置就是将外部 IP 地址和服务引入到 k8s 集群内部,由 service 作为一个代理来达到能够访问外部服务的目的。

---

<a name="Ju7fX"></a>
## 13.7 coredns 组件详解
DNS 是什么? <br />DNS 全称是 Domain Name System:域名系统,是整个互联网的电话簿,它能够将可被人理解的域名翻译成可被机器理解 IP 地址,使得互联网的使用者不再需要直接接触很难阅读和理解的 IP 地址。域名系统在现在的互联网中非常重要,因为服务器的 IP 地址可能会经常变动,如果没有了 DNS,那么可能 IP 地址一旦发生了更改,当前服务器的客户端就没有办法连接到目标的服务器了,如果我们为 IP 地址提供一个『别名』并在其发生变动时修改别名和 IP 地址的关系,那么我们就可以保证集群对外提供的服务能够相对稳定地被其他客户端访。DNS 其实就是一个分布式的树状命名系统,它就像一个去中心化的分布式数据库,存储着从域名到 IP 地址的映射。 <br />CoreDNS? <br />CoreDNS 其实就是一个 DNS 服务,而 DNS 作为一种常见的服务发现手段,所以很多开源项目以及工程师都会使用 CoreDNS 为集群提供服务发现的功能,Kubernetes 就在集群中使用 CoreDNS解决服务发现的问题。 作为一个加入 CNCF(Cloud Native Computing Foundation)的服务,CoreDNS 的实现非常简单。

1. 解压镜像包到机器
```shell
docker load -i dig.tar.gz
  1. 编写yaml文件

    vim dig.yaml
    
    apiVersion: v1
    kind: Pod
    metadata:
    name: dig
    namespace: default
    spec:
    containers:
    - name: dig
     image: docker.io/dig:latest
     command:
     - sleep
     - "3600"
     imagePullPolicy: IfNotPresent
    restartPolicy: Always
    
  2. 执行yaml文件并验证 ```yaml kubectl apply -f dig.yaml

kubectl get svc | grep kubernetes

kubectl exec -it dig — nslookup kubernetes

在 k8s 中创建 service 之后,service 默认的 FQDN 是<service name>.<namespace>.svc.cluster.local,那么 k8s 集群内部的服务就可以通过 FQDN 访问

---

<a name="DRbiz"></a>
# 十四、k8s 持久化存储-常见的存储卷介绍
在 k8s 中为什么要做持久化存储? <br />在 k8s 中部署的应用都是以 pod 容器的形式运行的,假如我们部署 MySQL、Redis 等数据库,需要对这些数据库产生的数据做备份。因为 Pod 是有生命周期的,如果 pod 不挂载数据卷,那 pod 被删除或重启后这些数据会随之消失,如果想要长久的保留这些数据就要用到 pod 数据持久化存储。

---

<a name="htcej"></a>
## 14.1 k8s 持久化存储方案-emptyDir
<a name="S5CWp"></a>
### 14.1.1 查看 k8s 支持哪些存储
```shell
kubectl explain pods.spec.volumes
KIND:     Pod
VERSION:  v1

RESOURCE: volumes <[]Object>

DESCRIPTION:
     List of volumes that can be mounted by containers belonging to the pod.
     More info: https://kubernetes.io/docs/concepts/storage/volumes

     Volume represents a named volume in a pod that may be accessed by any
     container in the pod.

FIELDS:
   awsElasticBlockStore    <Object>
   azureDisk <Object>
   azureFile <Object>
   cephfs <Object>
   cinder <Object>
   configMap <Object>
   csi <Object>
   downwardAPI <Object>
   emptyDir <Object>
   ephemeral <Object>
   fc <Object>
   flexVolume <Object>
   flocker<Object>
   gcePersistentDisk <Object>
   gitRepo <Object>
   glusterfs <Object>
   hostPath <Object>
   iscsi <Object>
   name <string> -required-
   nfs <Object>
   persistentVolumeClaim <Object>
   photonPersistentDisk <Object>
   portworxVolume <Object>
   projected <Object>
   quobyte <Object>
   rbd <Object>
   scaleIO <Object>
   secret <Object>
   storageos <Object>
   vsphereVolume <Object>

常用的如下:
emptyDir
hostPath
nfs
persistentVolumeClaim
glusterfs
cephfs
configMap
secret
我们想要使用存储卷,需要经历如下步骤
1、定义 pod 的 volume,这个 volume 指明它要关联到哪个存储上的
2、在容器中要使用 volume mounts 挂载对应的存储
经过以上两步才能正确的使用存储卷

14.1.2 挂载emptyDir 类型

emptyDir 类型的 Volume 是在 Pod 分配到 Node 上时被创建,Kubernetes 会在 Node 上自动分配一个目录,因此无需指定宿主机 Node 上对应的目录文件。这个目录的初始内容为空,当 Pod 从 Node上移除时,emptyDir 中的数据会被永久删除。emptyDir Volume 主要用于某些应用程序无需永久保存
的临时目录,多个容器的共享目录等。

  1. 创建一个 pod,挂载临时目录 emptyDir

    vim  emptydir.yaml
    
    apiVersion: v1
    kind: Pod
    metadata:
    name: pod-empty
    spec:
    containers:
    - name: container-empty
     image: nginx
     imagePullPolicy: IfNotPresent
     volumeMounts:
     - mountPath: /cache
       name: cache-volume
    volumes:
    - emptyDir: {}
     name: cache-volume
    
  2. 执行yaml文件查看分配节点 ```shell kubectl apply -f emptydir.yaml

kubectl get pods -o wide | grep empty

kubectl get pods pod-empty -o yaml | grep uid

uid: 63cd54f1-a60e-4710-99db-a4bda397a749

3. 查看pod的uid并登录到机器上查看
```shell
# 从机器上
yum -y install tree

tree /var

/var/lib/kubelet/
├── pods
└── 63cd54f1-a60e-4710-99db-a4bda397a749
├── containers
│ └── container-empty
│ └── 70654d0a
├── etc-hosts
├── plugins
│ └── kubernetes.io~empty-dir
│ ├── cache-volume
│ │ └── ready
│ └── wrapped_kube-api-access-vbchr
│ └── ready
└── volumes
├── kubernetes.io~empty-dir
│ └── cache-volume
└── kubernetes.io~projected
└── kube-api-access-vbchr
├── ca.crt -> ..data/ca.crt
├── namespace -> ..data/namespace
└── token -> ..data/token

  1. 测试创建目录 ```shell

    从机执行

    cd /var/lib/kubelet/pods/63cd54f1-a60e-4710-99db-a4bda397a749/volumes/kubernetes.io~empty-dir/cache-volume/

mkdir aa

tree /var/lib/kubelet/pods/63cd54f1-a60e-4710-99db-a4bda397a749

/var/lib/kubelet/pods/63cd54f1-a60e-4710-99db-a4bda397a749<br />├── containers<br />│   └── container-empty<br />│       └── 70654d0a<br />├── etc-hosts<br />├── plugins<br />│   └── kubernetes.io~empty-dir<br />│       ├── cache-volume<br />│       │   └── ready<br />│       └── wrapped_kube-api-access-vbchr<br />│           └── ready<br />└── volumes<br />    ├── kubernetes.io~empty-dir<br />    │   └── cache-volume<br />    │       └── aa<br />    └── kubernetes.io~projected<br />        └── kube-api-access-vbchr<br />            ├── ca.crt -> ..data/ca.crt<br />            ├── namespace -> ..data/namespace<br />            └── token -> ..data/token

---

<a name="puEzg"></a>
## 14.2 k8s 持久化存储方案-hostPath
hostPath Volume 是指 Pod 挂载宿主机上的目录或文件。<br />hostPath Volume 使得容器可以使用宿主机的文件系统进行存储,hostpath(宿主机路径):节点级别的存储卷,在 pod 被删除,这个存储卷还是存在的,不会被删除,所以只要同一个 pod 被调度到同一个节点上来,在 pod 被删除重新被调度到这个节点之后,对应的数据依然是存在的。

---

<a name="mbfuj"></a>
### 14.2.1 查看 hostPath 存储卷的用法
```shell
kubectl explain pods.spec.volumes.hostPath
KIND:     Pod
VERSION:  v1

RESOURCE: hostPath <Object>

DESCRIPTION:
     HostPath represents a pre-existing file or directory on the host machine
     that is directly exposed to the container. This is generally used for
     system agents or other privileged things that are allowed to see the host
     machine. Most containers will NOT need this. More info:
     https://kubernetes.io/docs/concepts/storage/volumes#hostpath

     Represents a host path mapped into a pod. Host path volumes do not support
     ownership management or SELinux relabeling.

FIELDS:
   path    <string> -required-
     Path of the directory on the host. If the path is a symlink, it will follow
     the link to the real path. More info:
     https://kubernetes.io/docs/concepts/storage/volumes#hostpath

   type    <string>
     Type for HostPath Volume Defaults to "" More info:
     https://kubernetes.io/docs/concepts/storage/volumes#hostpath

14.2.2 创建一个 pod,挂载 hostPath 存储卷

  1. 创建yaml文件挂载hostPath存储卷

    vim hostpath.yaml
    
    apiVersion: v1
    kind: Pod
    metadata:
    name: test-hostpath
    spec:
    containers:
    - image: nginx
     imagePullPolicy: IfNotPresent
     name: test-nginx
     volumeMounts:
     - mountPath: /test-nginx
       name: test-volume
    - image: tomcat
     imagePullPolicy: IfNotPresent
     name: test-tomcat
     volumeMounts:
     - mountPath: /test-tomcat
       name: test-volume
    volumes:
    - name: test-volume
     hostPath:
       path: /data1
       type:  DirectoryOrCreate
    

    注意:
    # DirectoryOrCreate 表示本地有/data1 目录,就用本地的,本地没有就会在 pod 调度到的节点自动创建一个

  2. 更新资源清单 ```shell kubectl apply -f hostpath.yaml

kubectl get pods -o wide | grep hostpath

test-hostpath       2/2     Running   0          65m    10.244.92.1     k8s-slave2     <none>         <none>

3. 登录到节点查看
```shell
# 从节点执行
ll /data1/

mkdir /data1/aa
  1. 到nginx容器中查看验证 ```shell kubectl exec -it test-hostpath -c test-nginx — /bin/bash

容器执行

ls /test-nginx/


5. 到tomcat容器中查看验证
```shell
kubectl exec -it test-hostpath  -c test-tomcat -- /bin/bash

# 容器执行
ls /test-tomcat/

通过上面测试可以看到,同一个 pod 里的 test-nginx 和 test-tomcat 这两个容器是共享存储卷的。
hostpath 存储卷缺点:
单节点 ,pod 删除之后重新创建必须调度到同一个 node 节点,数据才不会丢失


14.3 k8s 持久化存储方案-NFS

hostPath 存储,存在单点故障,pod 挂载 hostPath 时,只有调度到同一个节点,数据才不会丢失。那可以使用 nfs 作为持久化存储

  1. 搭建nfs

k8s-master1 作为服务端

yum install nfs-utils -y

mkdir /data/volumes -pv

systemctl enable nfs --now
  1. 编辑nfs配置文件,让配置生效
    vim /etc/exports
    
    /data/volumes *(rw,no_root_squash)
    #no_root_squash: 用户具有根目录的完全管理访问权限 ```shell exportfs -arv

service nfs start

systemctl status nfs


3. 每个从节点安装nfs
```shell
yum install nfs-utils -y

service nfs start

systemctl enable nfs

mkdir /data2

# 挂载测试
mount -t nfs 192.168.1.63:/data/volumes /data2

# 卸载挂载
umount /data2/
  1. 编写yaml文件

    vim nfs.yaml
    
    apiVersion: v1
    kind: Pod
    metadata:
    name: test-nfs-volume
    spec:
    containers:
    - name: test-nfs
     image: nginx
     imagePullPolicy: IfNotPresent
     ports:
     - containerPort: 80
       protocol: TCP
     volumeMounts:
     - name: nfs-volumes
       mountPath: /usr/share/nginx/html
    volumes:
    - name: nfs-volumes
     nfs:
       path: /data/volumes
       server: 192.168.1.63
    

    注:path: /data/volumes #nfs 的共享目录
    server:192.168.1.63 是 xuegod63 机器的 ip,这个是安装 nfs 服务的地址

  2. 更新资源清单,查看是否成功 ```shell kubectl apply -f nfs.yaml

kubectl get pods -o wide | grep nfs

test-nfs-volume   1/1     Running   0          14m     10.244.0.66   k8s-slave1   <none>           <none>

6. 创建文件测试
```shell
echo 'hello, Welcome to xuegod'  > /data/volumes/index.html

curl 10.244.0.66
  1. 容器内查看 ```shell kubectl exec -it test-nfs-volume — /bin/bash

容器内执行

cat /usr/share/nginx/html/index.html

#上面说明挂载 nfs 存储卷成功了,nfs 支持多个客户端挂载,可以创建多个 pod,挂载同一个 nfs 服务器共享出来的目录;但是 nfs 如果宕机了,数据也就丢失了,所以需要使用分布式存储,常见的分布式存储有 glusterfs 和 cephfs

---

<a name="kCWFD"></a>
## 14.4 k8s 持久化存储方案-PVC
**参考官网: **<br />**https://kubernetes.io/docs/concepts/storage/persistent-volumes#persistentvolumeclaims**
<a name="Xckm7"></a>
### 14.4.1 K8s PV 概述

1. K8s PV 是什么? 

PersistentVolume(PV)是群集中的一块存储,由管理员配置或使用存储类动态配置。 它是集群中的资源,就像 pod 是 k8s 集群资源一样。 PV 是容量插件,如 Volumes,其生命周期独立于使用 PV 的任何单个 pod。

2. K8s PVC 是什么? 

PersistentVolumeClaim(PVC)是一个持久化存储卷,我们在创建 pod 时可以定义这个类型的存储卷。<br />它类似于一个 pod。Pod 消耗节点资源,PVC 消耗 PV 资源。Pod 可以请求特定级别的资源(CPU和内存)。 pvc 在申请 pv 的时候也可以请求特定的大小和访问模式(例如,可以一次读写或多次只读)。

---

<a name="iWmQy"></a>
### 14.4.2 工作原理
K8s PVC 和 PV 工作原理<br />PV 是群集中的资源。 PVC 是对这些资源的请求。 <br />PV 和 PVC 之间的相互作用遵循以下生命周期:

1. pv 的供应方式 

可以通过两种方式配置 PV:静态或动态。<br />静态的: <br />集群管理员创建了许多 PV。它们包含可供群集用户使用的实际存储的详细信息。它们存在于Kubernetes API 中,可供使用。<br />动态的: <br />当管理员创建的静态 PV 都不匹配用户的 PersistentVolumeClaim 时,群集可能会尝试为 PVC 专门动态配置卷。此配置基于 StorageClasses,PVC 必须请求存储类,管理员必须创建并配置该类,以便进行动态配置。

2. 绑定 

用户创建 pvc 并指定需要的资源和访问模式。在找到可用 pv 之前,pvc 会保持未绑定状态

3. 使用 

(1) 需要找一个存储服务器,把它划分成多个存储空间; <br />(2) k8s 管理员可以把这些存储空间定义成多个 pv; <br />(3) 在 pod 中使用 pvc 类型的存储卷之前需要先创建 pvc,通过定义需要使用的 pv 的大小和对应的访问模式,找到合适的 pv; <br />(4) pvc 被创建之后,就可以当成存储卷来使用了,我们在定义 pod 时就可以使用这个 pvc 的存储卷 <br />(5) pvc 和 pv 它们是一一对应的关系,pv 如果被 pvc 绑定了,就不能被其他 pvc 使用了; <br />(6) 我们在创建 pvc 的时候,应该确保和底下的 pv 能绑定,如果没有合适的 pv,那么 pvc 就会处于pending 状态。

4. 回收策略 

当我们创建 pod 时如果使用 pvc 做为存储卷,那么它会和 pv 绑定,当删除 pod,pvc 和 pv 绑定就 <br />会解除,解除之后和 pvc 绑定的 pv 卷里的数据需要怎么处理,目前,卷可以保留,回收或删除: <br />Retain<br />Recycle (不推荐使用,1.15 可能被废弃了) <br />Delete <br />Retain<br />当删除 pvc 的时候,pv 仍然存在,处于 released 状态,但是它不能被其他 pvc 绑定使用,里面的数据还是存在的,当我们下次再使用的时候,数据还是存在的,这个是默认的回收策略 <br />Delete <br />删除 pvc 时即会从 Kubernetes 中移除 PV,也会从相关的外部设施中删除存储资产

---

<a name="nm0KP"></a>
### 14.4.3 pv字段解读

1. 查看定义 pv 需要的字段
```shell
kubectl explain pv
KIND:     PersistentVolume
VERSION:  v1

DESCRIPTION:
     PersistentVolume (PV) is a storage resource provisioned by an
     administrator. It is analogous to a node. More info:
     https://kubernetes.io/docs/concepts/storage/persistent-volumes

FIELDS:
   apiVersion    <string>
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind    <string>
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata    <Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>
     Spec defines a specification of a persistent volume owned by the cluster.
     Provisioned by an administrator. More info:
     https://kubernetes.io/docs/concepts/storage/persistent-volumes#persistent-volumes

   status    <Object>
     Status represents the current information/status for the persistent volume.
     Populated by the system. Read-only. More info:
     https://kubernetes.io/docs/concepts/storage/persistent-volumes#persistent-volumes
  1. 查看定义 nfs 类型的 pv 需要的字段
    kubectl explain pv.spec.nfs
    
    ``` KIND: PersistentVolume VERSION: v1

RESOURCE: nfs

DESCRIPTION: NFS represents an NFS mount on the host. Provisioned by an admin. More info: https://kubernetes.io/docs/concepts/storage/volumes#nfs

 Represents an NFS mount that lasts the lifetime of a pod. NFS volumes do
 not support ownership management or SELinux relabeling.

FIELDS: path -required- Path that is exported by the NFS server. More info: https://kubernetes.io/docs/concepts/storage/volumes#nfs

readOnly ReadOnly here will force the NFS export to be mounted with read-only permissions. Defaults to false. More info: https://kubernetes.io/docs/concepts/storage/volumes#nfs

server -required- Server is the hostname or IP address of the NFS server. More info: https://kubernetes.io/docs/concepts/storage/volumes#nfs


---

<a name="VWegc"></a>
### 14.4.4 创建pv和pvc

1. 创建nfs共享目录
```shell
mkdir /data/volume_test/v{1,2,3,4,5,6,7,8,9,10} -p

vim /etc/exports

/data/volume_test/v1 (rw,no_root_squash)
/data/volume_test/v2
(rw,no_root_squash)
/data/volume_test/v3 (rw,no_root_squash)
/data/volume_test/v4
(rw,no_root_squash)
/data/volume_test/v5 (rw,no_root_squash)
/data/volume_test/v6
(rw,no_root_squash)
/data/volume_test/v7 (rw,no_root_squash)
/data/volume_test/v8
(rw,no_root_squash)
/data/volume_test/v9 (rw,no_root_squash)
/data/volume_test/v10
(rw,no_root_squash)

exportfs -arv

systemctl restart nfs
  1. 创建pv
    vim pv.yaml
    
    ```yaml apiVersion: v1 kind: PersistentVolume metadata: name: v1 spec: capacity: storage: 1Gi # pv的存储空间 accessModes: [“ReadWriteOnce”] nfs: path: /data/volume_test/v1 #把 nfs 的存储空间创建成 pv server: 192.168.1.63 #nfs 服务器的地址

apiVersion: v1 kind: PersistentVolume metadata: name: v2 spec: capacity: storage: 2Gi accessModes: [“ReadWriteMany”] nfs: path: /data/volume_test/v2

server: 192.168.1.63

apiVersion: v1 kind: PersistentVolume metadata: name: v3 spec: capacity: storage: 3Gi accessModes: [“ReadOnlyMany”] nfs: path: /data/volume_test/v3

server: 192.168.1.63

apiVersion: v1 kind: PersistentVolume metadata: name: v4 spec: capacity: storage: 4Gi accessModes: [“ReadWriteOnce”,”ReadWriteMany”] nfs:

path: /data/volume_test/v4

apiVersion: v1 kind: PersistentVolume metadata: name: v5 spec: capacity: storage: 5Gi accessModes: [“ReadWriteOnce”,”ReadWriteMany”] nfs: path: /data/volume_test/v5

server: 192.168.1.63

apiVersion: v1 kind: PersistentVolume metadata: name: v6 spec: capacity: storage: 6Gi accessModes: [“ReadWriteOnce”,”ReadWriteMany”] nfs: path: /data/volume_test/v6

server: 192.168.1.63

apiVersion: v1 kind: PersistentVolume metadata: name: v7 spec: capacity: storage: 7Gi accessModes: [“ReadWriteOnce”,”ReadWriteMany”] nfs: path: /data/volume_test/v7

server: 192.168.1.63

apiVersion: v1 kind: PersistentVolume metadata: name: v8 spec: capacity: storage: 8Gi accessModes: [“ReadWriteOnce”,”ReadWriteMany”] nfs: path: /data/volume_test/v8

server: 192.168.1.63

apiVersion: v1 kind: PersistentVolume metadata: name: v9 spec: capacity: storage: 9Gi accessModes: [“ReadWriteOnce”,”ReadWriteMany”] nfs: path: /data/volume_test/v9

server: 192.168.1.63

apiVersion: v1 kind: PersistentVolume metadata: name: v10 spec: capacity: storage: 10Gi accessModes: [“ReadWriteOnce”,”ReadWriteMany”] nfs: path: /data/volume_test/v10 server: 192.168.1.63

```shell
kubectl apply -f pv.yaml

kubectl get pv

如果STATUS 是 Available,表示 pv 是可用的

  1. 创建 pvc,和符合条件的 pv 绑定
    vim pvc.yaml
    
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: my-pvc
    spec:
    accessModes: ["ReadWriteMany"]
    resources:
     requests:
       storage: 2Gi
    
    ```shell kubectl apply -f pvc.yaml

kubectl get pv

kubectl get pvc


4. 创建pod,挂载pvc
```shell
vim pod_pvc.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-pvc
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: nginx-html
      mountPath: /usr/share/nginx/html
  volumes:
  - name: nginx-html
    persistentVolumeClaim:
      claimName: my-pvc
kubectl apply -f pod_pvc.yaml

kubectl get pod | grep pod-pvc

pod-pvc 1/1 Running 0 20m 10.244.92.2 k8s-slave2

  1. 测试是否挂载正常 ```shell echo “my-pvc” >> /data/volume_test/v2/index.html

curl 10.244.92.2


6. 删除pod重新挂载
```shell
kubectl delete -f pod_pvc.yaml

kubectl apply -f pod_pvc.yaml

curl 10.244.92.2
  1. 使用 pvc 和 pv 的注意事项

(1) 我们每次创建 pvc 的时候,需要事先有划分好的 pv,这样可能不方便,那么可以在创建 pvc 的时候直接动态创建一个 pv 这个存储类,pv 事先是不存在的
(2) pvc 和 pv 绑定,如果使用默认的回收策略 retain,那么删除 pvc 之后,pv 会处于 released 状态,我们想要继续使用这个 pv,需要手动删除 pv,kubectl delete pv pv_name,删除 pv,不会删除pv 里的数据,当我们重新创建 pvc 时还会和这个最匹配的 pv 绑定,数据还是原来数据,不会丢失。


14.5 k8s 存储类:storageclass

14.5.1 storageclass概述

上面介绍的 PV 和 PVC 模式都是需要先创建好 PV,然后定义好 PVC 和 pv 进行一对一的 Bond,但是如果 PVC 请求成千上万,那么就需要创建成千上万的 PV,对于运维人员来说维护成本很高,Kubernetes
提供一种自动创建 PV 的机制,叫 StorageClass,它的作用就是创建 PV 的模板。k8s 集群管理员通过创建 storageclass 可以动态生成一个存储卷 pv 供 k8s pvc 使用。
每个 StorageClass 都包含字段 provisioner,parameters 和 reclaimPolicy。
具体来说,StorageClass 会定义以下两部分:
1、PV 的属性 ,比如存储的大小、类型等;
2、创建这种 PV 需要使用到的存储插件,比如 Ceph、NFS 等
有了这两部分信息,Kubernetes 就能够根据用户提交的 PVC,找到对应的 StorageClass,然后Kubernetes 就会调用 StorageClass 声明的存储插件,创建出需要的 PV。


14.5.2 字段解读

  1. 查看定义的 storageclass 需要的字段
    kubectl explain storageclass
    
    ``` KIND: StorageClass VERSION: storage.k8s.io/v1

DESCRIPTION: StorageClass describes the parameters for a class of storage for which PersistentVolumes can be dynamically provisioned.

 StorageClasses are non-namespaced; the name of the storage class according
 to etcd is in ObjectMeta.Name.

FIELDS: allowVolumeExpansion allowedTopologies<[]Object> apiVersion kind metadata mountOptions <[]string> parameters provisioner -required- reclaimPolicy volumeBindingMode


---

<a name="m0ydm"></a>
### 14.5.3 provisioner供应商
storageclass 需要有一个供应者,用来确定我们使用什么样的存储来创建pv,常见的 provisioner 如下<br />https://kubernetes.io/zh/docs/concepts/storage/storage-classes/<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1652445081704-5d8fbf8a-eb39-4c51-836f-ff30fe9fa754.png#clientId=uabbe89e2-9da0-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=754&id=u91f9d056&margin=%5Bobject%20Object%5D&name=image.png&originHeight=754&originWidth=337&originalType=binary&ratio=1&rotation=0&showTitle=false&size=26760&status=done&style=none&taskId=u9e5274a7-c7be-497e-9f3b-e04e377530e&title=&width=337)<br />provisioner 既可以由内部供应商提供,也可以由外部供应商提供,如果是外部供应商可以参考 https://github.com/kubernetes-incubator/external-storage/下提供的方法创建。<br />以 NFS 为例,要想使用 NFS,我们需要一个 nfs-client 的自动装载程序,称之为 provisioner,这个程序会使用我们已经配置好的 NFS 服务器自动创建持久卷,也就是自动帮我们创建 PV。<br />reclaimPolicy:回收策略 <br />allowVolumeExpansion:允许卷扩展,PersistentVolume 可以配置成可扩展。将此功能设置为true 时,允许用户通过编辑相应的 PVC 对象来调整卷大小。当基础存储类的 allowVolumeExpansion字段设置为 true 时,以下类型的卷支持卷扩展。<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1652445197425-d98d66e6-a012-4bf3-8683-5a74976b2278.png#clientId=uabbe89e2-9da0-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=481&id=u703e7cc6&margin=%5Bobject%20Object%5D&name=image.png&originHeight=481&originWidth=342&originalType=binary&ratio=1&rotation=0&showTitle=false&size=13799&status=done&style=none&taskId=u7584c4b7-cb70-48a3-8f0a-3eb4cfd98cf&title=&width=342)

---

<a name="Z3yAd"></a>
### 14.5.4 安装 nfs provisioner

1. 解压nfs供应商镜像到子节点

[nfs-subdir-external-provisioner.tar.zip](https://www.yuque.com/attachments/yuque/0/2022/zip/12849133/1657079479597-3b26eaef-4ede-45c9-a9fe-f19b1299c96e.zip?_lake_card=%7B%22src%22%3A%22https%3A%2F%2Fwww.yuque.com%2Fattachments%2Fyuque%2F0%2F2022%2Fzip%2F12849133%2F1657079479597-3b26eaef-4ede-45c9-a9fe-f19b1299c96e.zip%22%2C%22name%22%3A%22nfs-subdir-external-provisioner.tar.zip%22%2C%22size%22%3A18007636%2C%22type%22%3A%22application%2Fx-zip-compressed%22%2C%22ext%22%3A%22zip%22%2C%22source%22%3A%22%22%2C%22status%22%3A%22done%22%2C%22download%22%3Afalse%2C%22taskId%22%3A%22u0b8e0b55-c5a3-4035-a831-01ed418553a%22%2C%22taskType%22%3A%22transfer%22%2C%22id%22%3A%22u5c447281%22%2C%22card%22%3A%22file%22%7D)
```shell
tar xf nfs-subdir-external-provisioner.tar

docker load -i nfs-subdir-external-provisioner.tar.gz
  1. 创建供应商sa
    vim serviceaccount.yaml
    
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: nfs-provisioner
    
    ```shell kubectl apply -f serviceaccount.yaml

kubectl create clusterrolebinding nfs-provisioner —clusterrole=cluster-admin —serviceaccount=default:nfs-provisioner


3. 创建nfs共享目录
```shell
mkdir /data/nfs_pro -p

vim /etc/exports

/data/nfs_pro *(rw,no_root_squash)

exportfs -arv

systemctl restart nfs
  1. 编写nfs供应商yaml文件,并执行
    vim nfs-deployment.yaml
    
    kind: Deployment
    apiVersion: apps/v1
    metadata:
    name: nfs-provisioner
    spec:
    selector:
     matchLabels:
       app: nfs-provisioner
    replicas: 1
    strategy:
     type: Recreate
    template:
     metadata:
       labels:
         app: nfs-provisioner
     spec:
       serviceAccount: nfs-provisioner
       containers:
       - name: nfs-provisioner
         image: registry.cn-beijing.aliyuncs.com/mydlq/nfs-subdir-external-provisioner:v4.0.0
         imagePullPolicy: IfNotPresent
         volumeMounts:
         - name: nfs-client-root
           mountPath: /persistentvolumes
         env:
         - name: PROVISIONER_NAME
           value: example.com/nfs
         - name: NFS_SERVER
           value: 192.168.1.63
         - name: NFS_PATH
           value: /data/nfs_pro
       volumes:
       - name: nfs-client-root
         nfs:
           server: 192.168.1.63
           path: /data/nfs_pro
    
    ```shell kubectl apply -f nfs-deployment.yaml

kubectl get pods | grep nfs


---

<a name="M07u6"></a>
### 14.5.5 创建storageclass,动态供给 pv

1. 创建storageclass
```shell
vim nfs-storageclass.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: nfs
provisioner: example.com/nfs

注意:provisioner 处写的 example.com/nfs 应该跟安装 nfs provisioner 时候的 env 下的 PROVISIONER_NAME 的 value 值保持一致

kubectl apply -f nfs-storageclass.yaml

kubectl get storageclass

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs example.com/nfs Delete Immediate false 7s
#显示内容如上,说明 storageclass 创建成功了


14.5.6 创建 pvc,通过 storageclass 动态生成 pv

vim claim.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim1
spec:
  accessModes: ["ReadWriteMany"]
  resources:
    requests:
      storage: 1Gi
  storageClassName: nfs
kubectl apply -f claim.yaml

kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-claim1 Bound pvc-07ed4e9b-949f-422c-bfba-73067625be86 1Gi RWX nfs 5m2s
#通过上面可以看到 test-claim1 的 pvc 已经成功创建了,绑定的 pv 是 pvc-07ed4e9b-949f-422c-bfba-73067625be86,这个 pv 是由 storageclass 调用 nfs provisioner 自动生成的。
步骤总结:
(1) 供应商:创建一个 nfs provisioner
(2) 创建 storageclass,storageclass 指定刚才创建的供应商
(3) 创建 pvc,这个 pvc 指定 storageclass


14.5.7 创建 pod,挂载 storageclass 动态生成的 pvc

  1. 创建yaml文件

    vim read-pod.yaml
    
    kind: Pod
    apiVersion: v1
    metadata:
    name: read-pod
    spec:
    containers:
    - name: read-pod
     image: nginx
     imagePullPolicy: IfNotPresent
     volumeMounts:
     - name: nfs-pvc
       mountPath: /usr/share/nginx/html
    restartPolicy: "Never"
    volumes:
    - name: nfs-pvc
     persistentVolumeClaim:
       claimName: test-claim1
    
  2. 应用yaml文件并测试 ```shell kubectl apply -f read-pod.yaml

kubectl get pods | grep read

kubectl describe pod read-pod


---

<a name="chAee"></a>
# 十五、kubernetes 控制器-Statefulset 详解-管理有状态的应用
<a name="a2B8Q"></a>
## 15.1 Statefulset 控制器:概念、原理解读
StatefulSet 是为了管理有状态服务的问题而设计的

---

<a name="TDWyj"></a>
### 15.1.1 有状态服务和无状态服务

1. 有状态服务

StatefulSet 是有状态的集合,管理有状态的服务,它所管理的 Pod 的名称不能随意变化。数据持久化的目录也是不一样,每一个 Pod 都有自己独有的数据持久化存储目录。比如 MySQL 主从、redis 集群等。

2. 无状态服务

RS、Deployment、DaemonSet 都是管理无状态的服务,它们所管理的 Pod 的 IP、名字,启停顺序等都是随机的。个体对整体无影响,所有 pod 都是共用一个数据卷的,部署的 tomcat 就是无状态的服务,tomcat 被删除,在启动一个新的 tomcat,加入到集群即可,跟 tomcat 的名字无关。

---

<a name="q1oXA"></a>
### 15.1.2 StatefulSet组成

1. Headless Service(没有 ip 的 service):用来定义 pod 网路标识,生成可解析的 DNS 记录 service ip 存在哪?iptables 或者 ipvs 规则中 
2. volumeClaimTemplates:存储卷申请模板,创建 pvc,指定 pvc 名称大小,自动创建 pvc,且 pvc 由存储类供应。 
3. StatefulSet:管理 pod 的

什么是 Headless service? <br />Headless service 不分配 clusterIP,headless service 可以通过解析 service 的 DNS,返回所有Pod 的 dns 和 ip 地址 (statefulSet 部署的 Pod 才有 DNS),普通的 service,只能通过解析 service 的DNS 返回 service 的 ClusterIP。<br />为什么要用 headless service(没有 service ip 的 service)? <br />在使用 Deployment 时,创建的 Pod 名称是没有顺序的,是随机字符串,在用 statefulset 管理pod 时要求 pod 名称必须是有序的 ,每一个 pod 不能被随意取代,pod 重建后 pod 名称还是一样的。因为 pod IP 是变化的,所以要用 Pod 名称来识别。pod 名称是 pod 唯一性的标识符,必须持久稳定有效。这时候要用到无头服务,它可以给每个 Pod 一个唯一的名称。<br />为什么要用 volumeClaimTemplate? <br />对于有状态应用都会用到持久化存储,比如 mysql 主从,由于主从数据库的数据是不能存放在一个目录下的,每个 mysql 节点都需要有自己独立的存储空间。而在 deployment 中创建的存储卷是一个共享的存储卷,多个 pod 使用同一个存储卷,它们数据是同步的,而 statefulset 定义中的每一个 pod 都不能使用同一个存储卷,这就需要使用 volumeClaimTemplate,当在使用 statefulset 创建 pod 时,volumeClaimTemplate 会自动生成一个 PVC,从而请求绑定一个 PV,每一个 pod 都有自己专用的存储卷。Pod、PVC 和 PV 对应的关系图如下:<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1652540703618-fa783c5d-f65e-446e-982a-f8f10b9c1295.png#clientId=ud6e6a8e9-259e-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=472&id=uc754a0a4&margin=%5Bobject%20Object%5D&name=image.png&originHeight=472&originWidth=880&originalType=binary&ratio=1&rotation=0&showTitle=false&size=131455&status=done&style=none&taskId=u8b3b0d52-c836-469b-974a-01875a02783&title=&width=880)

---

<a name="lSLqu"></a>
## 15.2 Statefulset 资源清单文件编写技巧

1. 查看定义statefulset需要的字段
```shell
kubectl explain statefulset
KIND:     StatefulSet
VERSION:  apps/v1

DESCRIPTION:
     StatefulSet represents a set of pods with consistent identities. Identities
     are defined as:
     - Network: A single stable DNS and hostname.
     - Storage: As many VolumeClaims as requested. The StatefulSet guarantees
     that a given network identity will always map to the same storage identity.

FIELDS:
   apiVersion    <string>                 #定义 statefulset 资源需要使用的 api 版本
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

   kind    <string>                       #定义的资源类型
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

   metadata    <Object>                    #元数据
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>                        #定义容器相关的信息
     Spec defines the desired identities of pods in this set.

   status    <Object>
     Status is the current status of Pods in this StatefulSet. This data may be
     out of date by some window of time.
  1. 查看 statefulset.spec 字段
    kubectl explain statefulset.spec
    
    ``` KIND: StatefulSet VERSION: apps/v1

RESOURCE: spec

DESCRIPTION: Spec defines the desired identities of pods in this set.

 A StatefulSetSpec is the specification of a StatefulSet.

FIELDS: podManagementPolicy #pod 管理策略 replicas #副本数 revisionHistoryLimit #保留的历史版本 selector -required- #标签选择器,选择它所关联的 pod serviceName -required- #headless service 的名字 template -required- #生成 pod 的模板 updateStrategy #更新策略 volumeClaimTemplates <[]Object> #存储卷申请模板


3. 查看 statefulset 的 spec.template 字段

#对于 template 而言,其内部定义的就是 pod,pod 模板是一个独立的对象
```shell
kubectl explain statefulset.spec.template
KIND:     StatefulSet
VERSION:  apps/v1

RESOURCE: template <Object>

DESCRIPTION:
     template is the object that describes the pod that will be created if
     insufficient replicas are detected. Each pod stamped out by the StatefulSet
     will fulfill this Template, but have a unique identity from the rest of the
     StatefulSet.

     PodTemplateSpec describes the data a pod should have when created from a
     template

FIELDS:
   metadata    <Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata

   spec    <Object>
     Specification of the desired behavior of the pod. More info:
     https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status

通过上面可以看到,statefulset 资源中有两个 spec 字段。第一个 spec 声明的是 statefulset 定义多少个 Pod 副本(默认将仅部署 1 个 Pod)、匹配 Pod 标签的选择器、创建 pod 的模板、存储卷申请模板,第二个 spec 是 spec.template.spec:主要用于 Pod 里的容器属性等配置。
.spec.template 里的内容是声明 Pod 对象时要定义的各种属性,所以这部分也叫做 PodTemplate(Pod 模板)。还有一个值得注意的地方是:在.spec.selector 中定义的标签选择器必须能够匹配到 spec.template.metadata.labels 里定义的 Pod 标签,否则 Kubernetes 将不允许创建 statefulset。


15.3 Statefulset 使用案例-部署 web 站点

  1. 供应商pod创建,参考14.5.4
  2. 供应商创建

    vim class-web.yaml
    
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
    name: nfs-web
    provisioner: example.com/nfs
    
    kubectl apply -f class-web.yaml
    
    kubectl get storageclass
    
  3. 供应商statefulset

    vim statefulset.yaml
    

    ```yaml apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx spec: ports:

    • port: 80 name: web clusterIP: None selector: app: nginx

apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: nginx serviceName: “nginx” replicas: 2 template: metadata: labels: app: nginx spec: containers:

  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
      name: web
    volumeMounts:
    - name: www
      mountPath: /usr/share/nginx/html

volumeClaimTemplates:

  • metadata: name: www spec: accessModes: [“ReadWriteMany”] storageClassName: “nfs-web” resources:
    requests: 
      storage: 1Gi
    
    ```shell
    kubectl apply -f statefulset.yaml
    
  1. 验证是否成功 ```shell kubectl get svc -l app=nginx

kubectl get pvc

kubectl get pv

查看pod主机名

for i in 0 1; do kubectl exec web-$i — sh -c ‘hostname’; done


5. 运行容器查看域名解析
```shell
kubectl exec -it web-1 -- /bin/bash

# 容器内执行
apt-get update

apt-get install dnsutils -y

nslookup web-0.nginx.default.svc.cluster.local

dig -t A nginx.default.svc.cluster.local @10.10.0.10

格式如下:
@来指定域名服务器
A 为解析类型 ,A 记录
-t 指定要解析的类型
A 记录:
A 记录是解析域名到 IP

  1. 资源清单解读 ```yaml apiVersion: v1 #定义 api 版本 kind: Service #定义要创建的资源:service metadata: name: nginx #定义 service 的名字 labels: app: nginx #service 的标签 spec: ports:
    • port: 80 name: web clusterIP: None #创建一个没有 ip 的 service selector: app: nginx #选择拥有 app=nginx 标签的 pod

apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: nginx serviceName: nginx #headless service 的名字 replicas: 2 #副本数 template: #定义 pod 的模板 metadata: labels: app: nginx spec: containers:

  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
      name: web
    volumeMounts:
    - name: www
      mountPath: /usr/share/nginx/html

volumeClaimTemplates: #存储卷申请模板

  • metadata: name: www spec: accessModes: [“ReadWriteMany”] storageClassName: nfs-web #指定从哪个存储类申请 pv resources:
    requests: 
      storage: 1Gi             #需要 1G 的 pvc,会自动跟符合条件的 pv 绑定
    
    ```

15.4 Statefulset 管理 pod-扩容、缩容、更新

  1. Statefulset 实现 pod 的动态扩容
    如果我们觉得两个副本太少了,想要增加,只需要修改配置文件 statefulset.yaml 里的 replicas 的值即可,原来 replicas: 2,现在变成 replicaset: 3,修改之后,执行如下命令更新: ```shell kubectl apply -f statefulset.yaml

kubectl get sts

kubectl get pods -l app=nginx

NAME    READY   STATUS      RESTARTS    AGE<br />web-0    1/1        Running     0                 17m<br />web-1    1/1        Running     0                 16m<br />web-2    1/1       Running      0                  47s

2. Statefulset 实现 pod 的动态缩容
```shell
kubectl edit sts web

kubectl get sts

kubectl get pod -l app=nginx

NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 19m
web-1 1/1 Running 0 19m

  1. Statefulset 实现 pod 的更新

开一个终端用于观察

kubectl get pods -o wide -l app=nginx -w

修改镜像

kubectl edit sts web

观察变化
image.png
关于更新字段:
updateStrategy:
rollingUpdate:
partition: 1 partition=1 表示创建的 pod 序号>=1 的才会被更新


十六、Apiserver 安全机制之 RBAC 授权-Configmap 配置管理中心

16.1 K8s 安全管理:认证、授权、准入控制概述

k8s 对我们整个系统的认证,授权,访问控制做了精密的设置;对于 k8s 集群来说,apiserver 是整个集群访问控制的唯一入口,我们在 k8s 集群之上部署应用程序的时候,也可以通过宿主机的NodePort 暴露的端口访问里面的程序,用户访问 kubernetes 集群需要经历如下认证过程:认证->授权->准入控制(adminationcontroller)

  1. 认证(Authenticating)是对客户端的认证,通俗点就是用户名密码验证
  2. 授权(Authorization)是对资源的授权,k8s 中的资源无非是容器,最终其实就是容器的计算,网络,存储资源,当一个请求经过认证后,需要访问某一个资源(比如创建一个 pod),授权检查会根据授权规则判定该资源(比如某 namespace 下的 pod)是否是该客户可访问的。
  3. 准入(Admission Control)机制: 当请求通过了前面的认证和授权之后,还需要经过准入控制处理通过之后,apiserver 才会处理这 个请求。Admission Control 有一个准入控制列表,我们可以通过命令行设置选择执行哪几个准入控制器。只有所有的准入控制器都检查通过之后,apiserver 才执行该请求,否则返回拒绝。

准入控制器是在 API Server 的启动参数配置的。我们在部署 Kubernetes 集群的时候都会默认开启一系列准入控制器,如果没有设置这些准入控制器的话可以说你的 Kubernetes 集群就是在裸奔,只有集群管理员可以修改集群的准入控制器。例如我会默认开启如下的准入控制器。
—admission-control=ServiceAccount,NamespaceLifecycle,NamespaceExists,LimitRanger,ResourceQuota,MutatingAdmissionWebhook,ValidatingAdmissionWebhook
k8s 的整体架构也是一个微服务的架构,所有的请求都是通过一个 GateWay,也就是 kube-apiserver 这个组件(对外提供 REST 服务),k8s 中客户端有两类,一种是普通用户,一种是集群内的 Pod,这两种客户端的认证机制略有不同,但无论是哪一种,都需要依次经过认证,授权,准入这三个机制。


16.1.1 认证插件

  1. 令牌(token)认证:

token 即标志、记号的意思,在 IT 领域也叫作令牌。在计算机身份认证中是令牌(临时)的意思,在词法分析中是标记的意思。一般作为邀请、登录系统使用。

token 其实说的更通俗点可以叫暗号,在一些数据传输之前,要先进行暗号的核对,不同的暗号被授权不同的数据操作。例如在 USB1.1 协议中定义了 4 类数据包:token 包、data 包、handshake包和 special 包。主机和 USB 设备之间连续数据的交换可以分为三个阶段,第一个阶段由主机发送token 包,不同的 token 包内容不一样(暗号不一样)可以告诉设备做不同的工作,第二个阶段发送 data 包,第三个阶段由设备返回一个 handshake 包。

双方有一个共享密钥,服务器上先创建一个密码下来,客户端登陆的时候拿这个密码登陆即可,这个就是对称密钥认证方式;k8s 提供了一个 restful 风格的接口,它的所有服务都是通过 http 协议提供的,因此认证信息只能经由 http 协议的认证首部进行传递,这种认证首部进行传递通常叫做令牌;

  1. ssl 认证:

对于 k8s 访问来讲,ssl 认证能让客户端确认服务器的认证身份,我们在跟服务器通信的时候,需要服务器发过来一个证书,我们需要确认这个证书是不是 ca 签署的,如果是我们认可的 ca 签署的,里面的 subj 信息与我们访问的目标主机信息保持一致,没有问题,那么我们就认为服务器的身份得到认证了,k8s 中最重要的是服务器还需要认证客户端的信息,kubectl 也应该有一个证书,这个证书也是 server 所认可的 ca 签署的证书,双方需要互相认证,实现加密通信,这就是 ssl 认证。


16.1.2 账号认证

  1. kubernetes 上的账号

客户端对 apiserver 发起请求,apiserver 要识别这个用户是否有请求的权限,要识别用户本身能否通过 apiserver 执行相应的操作,那么需要哪些信息才能识别用户信息来完成对用户的相关的访问控制呢?

kubectl explain pods.spec 可以看到有一个字段 serviceAccountName(服务账号名称),这个就是我们 pod 连接 apiserver 时使用的账号,因此整个 kubernetes 集群中的账号有两类,ServiceAccount(服务账号),User account(用户账号)

User account:实实在在现实中的人,人可以登陆的账号,客户端想要对 apiserver 发起请求,apiserver 要识别这个客户端是否有请求的权限,那么不同的用户就会有不同的权限,靠用户账号表示,叫做 username

ServiceAccount:方便 Pod 里面的进程调用 Kubernetes API 或其他外部服务而设计的,是kubernetes 中的一种资源sa 账号:登陆 dashboard 使用的账号

ServiceAccount
Service account 是为了方便 Pod 里面的进程调用 Kubernetes API 或其他外部服务而设计的。它与 User account 不同,User account 是为人设计的,而 service account 则是为 Pod 中的进程调用 Kubernetes API 而设计;User account 是跨 namespace 的,而 service account 则是仅局限它所在的 namespace;每个 namespace 都会自动创建一个 default service account;开启 ServiceAccount Admission Controller 后
1) 每个 Pod 在创建后都会自动设置 spec.serviceAccount 为 default(除非指定了其他ServiceAccout)
2) 验证 Pod 引用的 service account 已经存在,否则拒绝创建;
当创建 pod 的时候,如果没有指定一个 serviceaccount,系统会自动在与该 pod 相同的namespace 下为其指派一个 default service account。这是 pod 和 apiserver 之间进行通信的账号

kubectl get pods web-0 -o yaml | grep "serviceAccountName"

kubectl get sa

kubectl get secret

默认的 service account 仅仅只能获取当前 Pod 自身的相关属性,无法观察到其他名称空间 Pod的相关属性信息。如果想要扩展 Pod,假设有一个 Pod 需要用于管理其他 Pod 或者是其他资源对象,是无法通过自身的名称空间的 serviceaccount 进行获取其他 Pod 的相关属性信息的,此时就需要进行手动创建一个 serviceaccount,并在创建 Pod 时进行定义。serviceaccount 该如何进行定义呢?实际上,service accout 也属于一个 k8s 资源,serviceAccount 也属于标准的k8s 资源,可以创建serviceAccount,创建之后由我们创建的 pod 使用 serviceAccountName 去加载自己定义的 serviceAccount 就可以了
(1)创建一个 serviceaccount

kubectl create sa test

kubectl get sa

image.png

kubectl describe sa test

image.png
上面可以看到生成了一个test-token-x8prv 的 secret 和 test-token-x8prv 的 token

kubectl get secret test-token-x8prv

image.png

kubectl describe secret test-token-x8prv

image.png
上面可以看到生成了 test-token-hnc57 的 token 详细信息,这个 token 就是 sa 连接 apiserver 的认证信息,这个 token 也是登陆 k8s dashboard 的 token,这些是一个认证信息,能够登陆k8s,能认证到 k8s,但是不能做别的事情,不代表权限,想要做其他事情,需要授权


16.1.3 kubeconfig 文件

在 K8S 集群当中,每一个用户对资源的访问都是需要通过 apiserver 进行通信认证才能进行访问的,那么在此机制当中,对资源的访问可以是 token,也可以是通过配置文件的方式进行保存和使用认证信息,可以通过 kubectl config 进行查看配置

kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://192.168.1.199:16443           # apiserver的地址
  name: kubernetes                                # 集群的名字
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes                # 上下文的名字
current-context: kubernetes-admin@kubernetes           # 当前上下文的名字
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

在上面的配置文件当中,定义了集群、上下文以及用户。其中 Config 也是 K8S 的标准资源之一,在该配置文件当中定义了一个集群列表,指定的集群可以有多个;用户列表也可以有多个,指明集群中的用户;而在上下文列表当中,是进行定义可以使用哪个用户对哪个集群进行访问,以及当前使用的上下文是什么。


16.1.4 授权

如果用户通过认证,什么权限都没有,需要一些后续的授权操作,如对资源的增删该查等,kubernetes1.6 之后开始有 RBAC(基于角色的访问控制机制)授权检查机制。Kubernetes 的授权是基于插件形成的,其常用的授权插件有以下几种:
1)Node(节点认证)
2)ABAC(基于属性的访问控制)
3)RBAC(基于角色的访问控制)
4)Webhook(基于 http 回调机制的访问控制)
什么是 RBAC(基于角色的访问控制)?
让一个用户(Users)扮演一个角色(Role),角色拥有权限,从而让用户拥有这样的权限,随后在授权机制当中,只需要将权限授予某个角色,此时用户将获取对应角色的权限,从而实现角色的访问控制。
image.png
在 k8s 的授权机制当中,采用 RBAC 的方式进行授权,其工作逻辑是,把对对象的操作权限定义到一个角色当中,再将用户绑定到该角色,从而使用户得到对应角色的权限。如果通过 rolebinding绑定 role,只能对 rolebingding 所在的名称空间的资源有权限,上图 user1 这个用户绑定到role1 上,只对 role1 这个名称空间的资源有权限,对其他名称空间资源没有权限,属于名称空间级别的;
另外,k8s 为此还有一种集群级别的授权机制,就是定义一个集群角色(ClusterRole),对集群内的所有资源都有可操作的权限,从而将 User2 通过 ClusterRoleBinding 到 ClusterRole,从而使User2 拥有集群的操作权限。Role、RoleBinding、ClusterRole 和 ClusterRoleBinding 的关系如下图
image.png
通过上图可以看到,可以通过 rolebinding 绑定 role,rolebinding 绑定 clusterrole,clusterrolebinding 绑定 clusterrole。
上面我们说了两个角色绑定:
(1)用户通过 rolebinding 绑定 role
(2)用户通过 clusterrolebinding 绑定 clusterrole
(3)rolebinding 绑定 clusterrole
rolebinding 绑定 clusterrole 的好处:
假如有 6 个名称空间,每个名称空间的用户都需要对自己的名称空间有管理员权限,那么需要定义 6 个 role 和 rolebinding,然后依次绑定,如果名称空间更多,我们需要定义更多的 role,这个是很麻烦的,所以我们引入 clusterrole,定义一个 clusterrole,对 clusterrole 授予所有权限,然后用户通过 rolebinding 绑定到 clusterrole,就会拥有自己名称空间的管理员权限了
注:RoleBinding 仅仅对当前名称空间有对应的权限。


16.1.5 准入控制

一般而言,准入控制只是用来定义我们授权检查完成之后的后续的其他安全检查操作的,进一步补充了授权机制,由多个插件组合实行,一般而言在创建,删除,修改或者做代理时做补充;
Kubernetes 的 Admission Control 实际上是一个准入控制器(Admission Controller)插件列表,发送到 APIServer 的请求都需要经过这个列表中的每个准入控制器插件的检查,如果某一个控制器插件准入失败,就准入失败。
控制器插件如下:
AlwaysAdmit:允许所有请求通过
AlwaysPullImages:在启动容器之前总是去下载镜像,相当于每当容器启动前做一次用于是否有权使用该容器镜像的检查
AlwaysDeny:禁止所有请求通过,用于测试
DenyEscalatingExec:拒绝 exec 和 attach 命令到有升级特权的 Pod 的终端用户访问。如果集中包含升级特权的容器,而要限制终端用户在这些容器中执行命令的能力,推荐使用此插件
ImagePolicyWebhook
ServiceAccount:这个插件实现了 serviceAccounts 等等自动化,如果使用 ServiceAccount 对象,强烈推荐使用这个插件
SecurityContextDeny:将 Pod 定义中定义了的 SecurityContext 选项全部失效。
SecurityContext 包含在容器中定义了操作系统级别的安全选型如 fsGroup,selinux 等选项
ResourceQuota:用于 namespace 上的配额管理,它会观察进入的请求,确保在 namespace 上的配额不超标。推荐将这个插件放到准入控制器列表的最后一个。ResourceQuota 准入控制器既可以限制某个 namespace 中创建资源的数量,又可以限制某个 namespace 中被 Pod 请求的资源总量。ResourceQuota 准入控制器和 ResourceQuota 资源对象一起可以实现资源配额管理。
LimitRanger:用于 Pod 和容器上的配额管理,它会观察进入的请求,确保 Pod 和容器上的配额不会超标。准入控制器LimitRanger 和资源对象 LimitRange 一起实现资源限制管理
NamespaceLifecycle:当一个请求是在一个不存在的 namespace 下创建资源对象时,该请求会被拒绝。当删除一个 namespace 时,将会删除该 namespace 下的所有资源对象
DefaultStorageClass
DefaultTolerationSeconds
PodSecurityPolicy

当 Kubernetes 版本>=1.6.0,官方建议使用这些插件:
-admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds

当 Kubernetes 版本>=1.4.0,官方建议使用这些插件:
–admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota

以上是标准的准入插件,如果是自己定制的话,k8s1.7 版 出了两个 alpha features, Initializers和 External Admission Webhooks


16.2 ServiceAccount 介绍

kubernetes 中账户区分为:User Accounts(用户账户) 和 Service Accounts(服务账户)两种:
UserAccount 是给 kubernetes 集群外部用户使用的,例如运维或者集群管理人员,,kubeadm安装的 k8s,默认用户账号是 kubernetes-admin;
k8s 客户端(一般用:kubectl) ———>API Server
APIServer 需要对客户端做认证,使用 kubeadm 安装的 K8s,会在用户家目录下创建一个认证配置文件 .kube/config 这里面保存了客户端访问 API Server 的密钥相关信息,这样当用 kubectl访问 k8s 时,它就会自动读取该配置文件,向 API Server 发起认证,然后完成操作请求。
用户名称可以在 kubeconfig 中查看

 cd ~/.kube/

 cat config

image.png
ServiceAccount 是 Pod 使用的账号,Pod 容器的进程需要访问 API Server 时用的就是ServiceAccount 账户;ServiceAccount 仅局限它所在的 namespace,每个 namespace 创建时都会自动创建一个 default service account;创建 Pod 时,如果没有指定 Service Account,Pod 则会使用 default Service Account。


16.3 RBAC 认证授权策略

参考: https://kubernetes.io/zh/docs/reference/access-authn-authz/rbac/
RBAC 介绍
在 Kubernetes 中,所有资源对象都是通过 API 进行操作,他们保存在 etcd 里。而对 etcd 的操作我们需要通过访问 kube-apiserver 来实现,上面的 Service Account 其实就是 APIServer 的认证过程,而授权的机制是通过 RBAC:基于角色的访问控制实现。
RBAC 有四个资源对象,分别是 Role、ClusterRole、RoleBinding、ClusterRoleBinding


16.3.1 Role 角色

一组权限的集合,在一个命名空间中,可以用其来定义一个角色,只能对命名空间内的资源进行授权。如果是集群级别的资源,则需要使用 ClusterRole。
例如:定义一个角色用来读取 Pod 的权限

 kubectl create ns rbac

 vim role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: rbac
  name: pod-read
rules:
- apiGroups: [""]
  resources: ["pods"]
  resourceNames: [""]
  verbs: ["get","watch","list"]

rules 中的参数说明:

  1. apiGroups:支持的 API 组列表,例如:”apiVersion: batch/v1”等
  2. resources:支持的资源对象列表,例如 pods、deplayments、jobs 等
  3. resourceNames: 指定 resource 的名称
  4. verbs:对资源对象的操作方法列表。

16.3.2 ClusterRole 集群角色

具有和角色一致的命名空间资源的管理能力,还可用于以下特殊元素的授权

  1. 集群范围的资源,例如 Node
  2. 非资源型的路径,例如:/healthz
  3. 包含全部命名空间的资源,例如 Pods

例如:定义一个集群角色可让用户访问任意 configmaps

vim clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: secrets-clusterrole
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get","watch","list"]

16.3.3 RoleBinding 角色绑定 | ClusterRolebinding 集群角色绑定

角色绑定和集群角色绑定用于把一个角色绑定在一个目标上,可以是 User,Group,ServiceAccount,使用 RoleBinding 为某个命名空间授权,使用 ClusterRoleBinding 为集群范围内授权。
例如:将在 rbac 命名空间中把 pod-read 角色授予用户 es

vim rolebinding-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-read-bind
  namespace: rbac
subjects:
- kind: User
  name: es
  apiGroup: rbac.authorization.k8s.io
roleRef:
- kind: Role
  name: pod-read
  apiGroup: rbac.authorization.k8s.io

RoleBinding 也可以引用 ClusterRole,对属于同一命名空间内的 ClusterRole 定义的资源主体进行授权
例如:es 能获取到集群中所有的资源信息

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: es-allresource
  namespace: rbac
subjects:
- kind: User
  name: es
  apiGroup: rbac.authorization.k8s.io
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin

集群角色绑定的角色只能是集群角色,用于进行集群级别或对所有命名空间都生效的授权
例如:允许 manager 组的用户读取所有 namaspace 的 configmap

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: read-secrets-global
  namespace: rbac
subjects:
- kind: Group
  name: manager
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: secret-reader
  apiGroup: rbac.authorization.k8s.io

16.4 资源的引用方式

多数资源可以用其名称的字符串表示,也就是 Endpoint 中的 URL 相对路径,例如 pod 中的日志是GET /api/v1/namaspaces/{namespace}/pods/{podname}/log 如果需要在一个 RBAC 对象中体现上下级资源,就需要使用“/”分割资源和下级资源。
例如:若想授权让某个主体同时能够读取 Pod 和 Pod log,则可以配置 resources 为一个数组

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: logs-reader
  namespace: default
rules:
- apiGroups: [""]
  resources: ["pods","pods/log"]
  verbs: ["get","list"]

资源还可以通过名称(ResourceName)进行引用,在指定 ResourceName 后,使用 get、delete、update、patch 请求,就会被限制在这个资源实例范围内
例如,下面的声明让一个主体只能对名为 my-configmap 的 Configmap 进行 get 和 update 操作

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: configmap-update
rules:
- apiGroups: [""]
  resources: ["configmap"]
  resourceNames: ["my-configmap"]
  verbs: ["get","update"]

16.5 常见角色示例

  1. 允许读取核心 API 组的 Pod 资源 ```yaml rules:
  • apiGroups: [“”] resources: [“pods”] verbs: [“get”,”list”,”watch”] ```
  1. 允许读写 extensions 和 apps 两个 API 组中的 deployment 资源 ```yaml rules:
  • apiGroups: [“extensions”,”apps”] resources: [“deployments”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”,”delete”] ```
  1. 允许读取 Pod 以及读写 job 信息 ```yaml rules:
  • apiGroups: [“”] resources: [“pods”] verbs: [“get”,”list”,”watch”]
  • apiGroups: [“batch”,”extensions”] resources: [“jobs”] verbs: [“get”,”list”,”watch”,”create”,”update”,”patch”,”delete”] ```
  1. 允许读取一个名为 my-confimap 的 ConfigMap(必须绑定到一个 RoleBinding 来限制到一个 Namespace 下的 ConfigMap) ```yaml rules:
  • apiGroups: [“”] resources: [“configmap”] resourceNames: [“my-configmap”] verbs: [“get”] ```
  1. 读取核心组的 Node 资源(Node 属于集群级的资源,所以必须存在于 ClusterRole 中,并使用 ClusterRoleBinding 进行绑定) ```yaml rules:
  • apiGroups: [“”] resources: [“nodes”] verbs: [“get”,”list”,”watch”] ```
  1. 允许对非资源端点“/healthz”及其所有子路径进行 GET 和 POST 操作(必须使用ClusterRole 和 ClusterRoleBinding) ```yaml rules:
  • nonResourceURLs: [“/healthz”,”/healthz/*”] verbs: [“get”,”post”] ```

16.6 常见的角色绑定示例

  1. 用户名 alice ```yaml subjects:
  • kind: User name: alice apiGroup: rbac.authorization.k8s.io ```
  1. 组名 alice ```yaml subjects:
  • kind: Group name: alice apiGroup: rbac.authorization.k8s.io ```
  1. kube-system 命名空间中默认 Service Account ```yaml subjects:
  • kind: ServiceAccount name: default namespace: kube-system ```

16.7 对 Service Account 的授权管理

Service Account 也是一种账号,是给运行在 Pod 里的进程提供了必要的身份证明。需要在 Pod定义中指明引用的 Service Account,这样就可以对 Pod 的进行赋权操作。
例如:pod 内可获取rbac 命名空间的所有 Pod 资源,pod-reader-sc 的 Service Account 是绑定了名为 pod-read 的Role

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: rbac
spec:
  serviceAccountName: pod-reader-sc
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 80

默认的 RBAC 策略为控制平台组件、节点和控制器授予有限范围的权限,但是除 kube-system 外的 Service Account 是没有任何权限的。

  1. 为一个应用专属的 Service Account 赋权此应用需要在 Pod 的 spec 中指定一个 serviceAccountName,用于 API、Application、Manifest、kubectl create serviceaccount 等创建 Service Account 的命令。

例如为 my-namespace 中的 my-sa Service Account 授予只读权限

kubectl create rolebinding my-sa-view --clusterrole=view --serviceaccount=my-namespace:my-sa --namespace=my-namespace
  1. 为一个命名空间中名为 default 的 Service Account 授权如果一个应用没有指定 serviceAccountName,则会使用名为 default 的 Service Account。注意,赋予 Service Account “default”的权限会让所有没有指定 serviceAccountName 的 Pod 都具有这些权限

例如,在 my-namespace 命名空间中为 Service Account“default”授予只读权限

kubectl create rolebinding default-view --clusterrole=view --serviceaccount=my-namespace:default --namespace=my-namespace

另外,许多系统级 Add-Ons 都需要在 kube-system 命名空间中运行,要让这些 Add-Ons 能够使用超级用户权限,则可以把 cluster-admin 权限赋予 kube-system 命名空间中名为 default 的Service Account,这一操作意味着 kube-system 命名空间包含了通向 API 超级用户的捷径。

kubectl create clusterrolebinding add-ons-add-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
  1. 为命名空间中所有 Service Account 都授予一个角色如果希望在一个命名空间中,任何 Service Account 应用都具有一个角色,则可以为这一命名空间的 Service Account 群组进行授权

    kubectl create rolebinding serviceaccounts-view --clusterrole=view --group=system:serviceaccounts:my-namespace --namespace=my-namespace
    
  2. 为集群范围内所有 Service Account 都授予一个低权限角色如果不想为每个命名空间管理授权,则可以把一个集群级别的角色赋给所有 Service Account

    kubectl create clusterrolebinding serviceaccounts-view --clusterrole=view --group=system:serviceaccounts
    
  3. 为所有 Service Account 授予超级用户权限

    kubectl create clusterrolebinding serviceaccounts-view --clusterrole=cluster-admin --group=system:serviceaccounts
    

16.8 使用 kubectl 命令行工具创建资源对象

  1. 在命名空间 rbac 中为用户 es 授权 admin ClusterRole

    kubectl create rolebinding bob-admin-binding --clusterrole=admin --user=es --namespace=rbac
    
  2. 在命名空间 rbac 中为名为 myapp 的 Service Account 授予 view ClusterRole

    kubctl create rolebinding myapp-role-binding --clusterrole=view --serviceaccount=rbac:myapp --namespace=rbac
    
  3. 在全集群范围内为用户 root 授予 cluster-admin ClusterRole

    kubectl create clusterrolebinding cluster-binding --clusterrole=cluster-admin --user=root
    
  4. 在全集群范围内为名为 myapp 的 Service Account 授予 view ClusterRole

    kubectl create clusterrolebinding service-account-binding --clusterrole=view --serviceaccount=myapp
    

    扩展:创建 sa,并授权
    sa 的全称是 serviceaccount。
    serviceaccount 是为了方便 Pod 里面的容器访问 k8s 集群而设计的。指定了 serviceaccount 之后,我们把 pod 创建出来了,我们在使用这个 pod 时,这个 pod 就有了我们指定的权限去操作 k8s 资源了。
    对 sa 授权

    kubectl create clusterrolebinding nfs-provisioner --clusterrole=cluster-admin --serviceaccount=default:nfs-provisioner
    

    注意:clusterrole=cluster-admin 具有所有权限,可以操作任何资源 —serviceaccount=default:nfs-provisioner
    default 是 sa 所在的名称空间 ,nfs-provisioner 是 sa 的名字
    #创建 pod,使用 sa

    spec:
    serviceAccount: nfs-provisioner
    containers:
    - name: nfs-provisioner
    

16.9 限制不同的用户操作 k8s 集群

  1. ssl 认证

生成一个证书
(1)生成一个私钥

cd /etc/kubernetes/pki/

umask 077; openssl genrsa -out lucky.key 2048

(2)生成一个证书请求

openssl req -new -key lucky.key -out lucky.csr -subj "/CN=lucky"

(3)生成一个证书

openssl x509 -req -in lucky.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out lucky.crt -days 3650
  1. 在 kubeconfig 下新增加一个 lucky 这个用户

    cp /root/.kube/config /root/.kube/config.bak
    

    (1)把 lucky 这个用户添加到 kubernetes 集群中,可以用来认证 apiserver 的连接

    kubectl config set-credentials lucky --client-certificate=./lucky.crt --client-key=./lucky.key --embed-certs=true
    

    (2)在 kubeconfig 下新增加一个上下文

    kubectl config set-context lucky@kubernetes --cluster=kubernetes --user=lucky
    

    (3)切换账号到 lucky,默认没有任何权限

    kubectl config use-context lucky@kubernetes
    
  2. kubectl config use-context kubernetes-admin@kubernetes 这个是集群用户,有任何权限把 user 这个用户通过 rolebinding 绑定到 clusterrole 上,授予权限,权限只是在 lucky 这个名称空间有效

    kubectl create ns lucky
    

    (1)把 lucky 这个用户通过 rolebinding 绑定到 clusterrole 上

    kubectl create rolebinding lucky -n lucky --clusterrole=cluster-admin --user=lucky
    

    (2)切换到 lucky 这个用户

    kubectl config use-context lucky@kubernetes
    

    (3)测试是否有权限 ```shell kubectl get pods -n lucky

kubectl get pods

添加一个 lucky 的普通用户
```shell
useradd lucky

cp -ar /root/.kube/ /home/lucky/

chown -R lucky.lucky /home/lucky/

su - lucky

kubectl get pods -n lucky

最后不要忘了切换回 kubernetes-admin 用户:

kubectl config use-context kubernetes-admin@kubernetes

十七、 基于 Ingress 实现 k8s 七层调度和负载均衡

17.1 如何在 k8s 中实现应用的负载均衡?

  1. 负载均衡是什么?

负载均衡是由多台服务器以对称的方式组成一个服务器集合,每台服务器都具有等价的地位,都可以单独对外提供服务而无需其他服务器的辅助。通过某种负载分担技术,将外部发送来的请求按照某种策略分配到服务器集合的某一台服务器上,而接收到请求的服务器独立地回应客户的请求。负载均衡解决了大量并发访问服务问题,其目的就是用最少的投资获得接近于大型主机的性能
image.png

  1. 常见负载均衡方案

四层负载均衡:lvs(软件层面)
缺点:对网络依赖较大,负载智能化方面没有 7 层负载好(比如不支持对 url 个性化负载),F5 硬件性能很高但成本也高,需要人民币几十万,对于小公司就望而却步了。
常见的七层负载均衡:nginx
优点:对网络依赖少,负载智能方案多(比如可根据不同的 url 进行负载)

  1. 在 k8s 中为什么要做负载均衡?

Pod 漂移问题,可以理解成 Pod IP 是变化的Kubernetes 具有强大的副本控制能力,能保证在任意副本(Pod)挂掉时自动从其他机器启动一个新的,还可以动态扩容等。通俗地说,这个 Pod 可能在任何时刻出现在任何节点上,也可能在任何时刻
死在任何节点上;那么自然随着 Pod 的创建和销毁,Pod IP 肯定会动态变化;那么如何把这个动态的Pod IP 暴露出去?这里借助于 Kubernetes 的 Service 机制,Service 可以以标签的形式选定一组带有指定标签的 Pod,并监控和自动负载他们的 Pod IP,那么我们向外暴露只暴露 Service IP 就行了;这就是 NodePort 模式:即在每个节点上开起一个端口,然后转发到内部 Pod IP 上。


17.2 使用 service 做四层负载均衡

在 kubernetes 中,Pod 是有生命周期的,如果 Pod 重建 IP 很有可能会发生变化。如果我们的服务都是将 Pod 的 IP 地址写死,Pod 的挂掉或者重建,和刚才重启的 pod 相关联的其他服务将会找不到它所关联的 Pod,为了解决这个问题,在kubernetes 中定义了 service 资源对象,Service 定义了一个服务访问的入口,客户端通过这个入口即可访问服务背后的应用集群实例,service 是一组 Pod 的逻辑集合,这一组 Pod 能够被 Service 访问到,通常是通过 Label Selector 实现的。
image.png
那我们通过 Service 访问 k8s 内部应用的时候数据包走向是什么样的呢?
image.png
此时的数据包流向如下:
客户端请求—>node 节点的 ip:端口—->service 的 ip:端口—->pod 的 ip:端口

案例演示:
创建一个 k8s pod,通过 service 代理

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  selector:
    matchLabels:
      run: my-nginx
  replicas: 2
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:
        - containerPort: 80

创建一个 service

apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    run: my-nginx
spec:
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: my-nginx

扩展: 在 k8s 为什么要引入七层负载均衡?
端口管理问题
采用 service 的 NodePort 方式暴露服务面临的问题是,服务一旦多起来,NodePort 在每个节点上开启的端口会及其庞大,而且难以维护。


17.3 四层负载均衡和七层负载均衡对比分析

  1. 四层的负载均衡就是基于 IP+端口的负载均衡:在三层负载均衡的基础上,通过发布三层的 IP 地址(VIP),然后加四层的端口号,来决定哪些流量需要做负载均衡,对需要处理的流量进行 NAT 处理,转发至后台服务器,并记录下这个 TCP 或者 UDP 的流量是由哪台服务器处理的,后续这个连接的所有流量都同样转发到同一台服务器处理。
  2. 七层的负载均衡就是基于虚拟的 URL 或主机 IP 的负载均衡:在四层负载均衡的基础上(没有四层是绝对不可能有七层的),再考虑应用层的特征,比如同一个 Web 服务器的负载均衡,除了根据 VIP加 80 端口辨别是否需要处理的流量,还可根据七层的 URL、浏览器类别、语言来决定是否要进行负载均衡。举个例子,如果你的 Web 服务器分成两组,一组是中文语言的,一组是英文语言的,那么七层负载均衡就可以当用户来访问你的域名时,自动辨别用户语言,然后选择对应的语言服务器组进行负载均衡处理。

17.4 Ingress 和 Ingress Controller 深度解读

17.4.1 Ingress 介绍

Ingress 官网定义:Ingress 可以把进入到集群内部的请求转发到集群中的一些服务上,从而可以把服务映射到集群外部。Ingress 能把集群内 Service 配置成外网能够访问的 URL,流量负载均衡,提供基于域名访问的虚拟主机等。
Ingress 简单的理解就是你原来需要改 Nginx 配置,然后配置各种域名对应哪个 Service,现在把这个动作抽象出来,变成一个 Ingress 对象,你可以用 yaml 创建,每次不要去改 Nginx 了,直接改yaml 然后创建/更新就行了;那么问题来了:”Nginx 该怎么处理?”
Ingress Controller 这东西就是解决 “Nginx 的处理方式” 的;Ingress Controller 通过与Kubernetes API 交互,动态的去感知集群中 Ingress 规则变化,然后读取他,按照他自己模板生成一段 Nginx 配置,再写到 Ingress Controller Nginx 里,最后 reload 一下,工作流程如下图:
image.png
实际上 Ingress 也是 Kubernetes API 的标准资源类型之一,它其实就是一组基于 DNS 名称 (host)或 URL 路径把请求转发到指定的 Service 资源的规则。用于将集群外部的请求流量转发到集群内部完成的服务发布。我们需要明白的是,Ingress 资源自身不能进行“流量穿透”,仅仅是一组规则的集合,这些集合规则还需要其他功能的辅助,比如监听某套接字,然后根据这些规则的匹配进行路由转发, 这些能够为 Ingress 资源监听套接字并将流量转发的组件就是 Ingress Controller。
注:Ingress 控制器不同于 Deployment 控制器的是,Ingress 控制器不直接运行为 kube-controller-manager 的一部分,它仅仅是 Kubernetes 集群的一个附件,类似于 CoreDNS,需要在集群上单独部署。


17.4.2 Ingress Controller 介绍

Ingress Controller 是一个七层负载均衡调度器,客户端的请求先到达这个七层负载均衡调度器,由七层负载均衡器在反向代理到后端 pod,常见的七层负载均衡器有 nginx、traefik,以我们熟悉的nginx 为例,假如请求到达 nginx,会通过 upstream 反向代理到后端 pod 应用,但是后端 pod 的 ip地址是一直在变化的,因此在后端 pod 前需要加一个 service,这个 service 只是起到分组的作用,那么我们 upstream 只需要填写 service 地址即可


17.4.3 Ingress 和 Ingress Controller 总结

Ingress Controller
Ingress Controller 可以理解为控制器,它通过不断的跟 Kubernetes API 交互,实时获取后端Service、Pod 的变化,比如新增、删除等,结合 Ingress 定义的规则生成配置,然后动态更新上边的Nginx 或者 trafik 负载均衡器,并刷新使配置生效,来达到服务自动发现的作用。
Ingress 则是定义规则,通过它定义某个域名的请求过来之后转发到集群中指定的 Service。它可以通过 Yaml 文件定义,可以给一个或多个 Service 定义一个或多个 Ingress 规则。


17.4.4 使用 Ingress Controller 代理 k8s 内部应用的流程

  1. 部署 Ingress controller,我们 ingress controller 使用的是 nginx
  2. 创建 Pod 应用,可以通过控制器创建 pod
  3. 创建 Service,用来分组 pod
  4. 创建 Ingress http,测试通过 http 访问应用
  5. 创建 Ingress https,测试通过 https 访问应用

客户端通过七层调度器访问后端 pod 的方式使用七层负载均衡调度器 ingress controller 时,当客户端访问 kubernetes 集群内部的应用时,
数据包走向如下图流程所示:
image.png


17.4.5 安装 Nginx Ingress Controller

Ingress-controller 和 k8s 版本对照:
Ingress-controller 官网: https://github.com/kubernetes/ingress-nginx/
image.png
下载 registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v1.1.1 和 registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller:v1.1.1

  1. 子节点拉取镜像 ```yaml docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen:v1.1.1

docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller:v1.1.1


2. 安装 Ingress conrtroller 需要的 yaml 所在的 github 地址 :  [https://github.com/kubernetes/ingressnginx/blob/main/deploy/static/provider/baremetal/deploy.yaml](https://github.com/kubernetes/ingressnginx/blob/main/deploy/static/provider/baremetal/deploy.yaml)

[deploy.yaml](https://www.yuque.com/attachments/yuque/0/2022/yaml/12849133/1657079479595-4d9b7e82-5f81-4e37-93e3-4dce5254e3f3.yaml?_lake_card=%7B%22src%22%3A%22https%3A%2F%2Fwww.yuque.com%2Fattachments%2Fyuque%2F0%2F2022%2Fyaml%2F12849133%2F1657079479595-4d9b7e82-5f81-4e37-93e3-4dce5254e3f3.yaml%22%2C%22name%22%3A%22deploy.yaml%22%2C%22size%22%3A19104%2C%22type%22%3A%22%22%2C%22ext%22%3A%22yaml%22%2C%22source%22%3A%22%22%2C%22status%22%3A%22done%22%2C%22download%22%3Afalse%2C%22taskId%22%3A%22u5fd890ac-ee2a-4f2c-88f9-e32fcd0a118%22%2C%22taskType%22%3A%22transfer%22%2C%22id%22%3A%22ud165c47b%22%2C%22card%22%3A%22file%22%7D)

3. RBAC授权并安装控制器
```yaml
kubectl create clusterrolebinding clusterrolebinding-user-3 --clusterrole=cluster-admin --user=system:serviceaccount:ingress-nginx:ingress-nginx

kubectl apply -f deploy.yaml
  1. 验证是否安装
    kubectl get pods -n ingress-nginx
    

17.5 测试 Ingress HTTP 代理 k8s 内部站点

步骤:
(1)部署 Ingress controller,我们 ingress controller 使用的是 nginx
(2)创建 Pod 应用,可以通过控制器创建 pod
(3)创建 Service,用来分组 pod
(4)创建 Ingress http,测试通过 http 访问应用
(5)创建 Ingress https,测试通过 https 访问应用

  1. 部署后端服务
    vim ingress-demo.yaml
    
    ```yaml apiVersion: v1 kind: Service metadata: name: tomcat namespace: default spec: selector: app: tomcat release: canary ports:
    • name: http targetPort: 8080 port: 8080
    • name: ajp targetPort: 8009 port: 8009

apiVersion: apps/v1 kind: Deployment metadata: name: tomcat-deploy namespace: default spec: replicas: 2 selector: matchLabels: app: tomcat release: canary template: metadata: labels: app: tomcat release: canary spec: containers:

  - name: tomcat
    image: tomcat
    imagePullPolicy: IfNotPresent 
    ports:
    - name: http
      containerPort: 8080
      name: ajp
      containerPort: 8009
```shell
kubectl apply -f ingress-demo.yaml

kubectl get pods -l app=tomcat
  1. 编写ingress配置清单
    vim ingress-myapp.yaml
    
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
    name: ingress-myapp
    namespace: default
    annotations:
     kubernetes.io/ingress.class: "nginx"
    spec:
    rules:                  #定义后端转发的规则
    - host: tomcat.lucky.com          #通过域名进行转发
     http:
       paths:
       - path: /             #配置访问路径,如果通过 url 进行转发,需要修改;空默认为访问的路径为"/"
         pathType: Prefix
         backend:                      #配置后端服务
           service:
             name: tomcat
             port:
               number: 8080
    
    ```yaml kubectl apply -f ingress-myapp.yaml

kubectl describe ingress ingress-myapp


3. 修改电脑host文件,访问测试

![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1653967391297-5bef1b21-6c34-4315-beb9-5635874c6c76.png#clientId=ufa51db25-0d95-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=377&id=u0aeb66b7&margin=%5Bobject%20Object%5D&name=image.png&originHeight=377&originWidth=604&originalType=binary&ratio=1&rotation=0&showTitle=false&size=10514&status=done&style=none&taskId=u2fd9617f-fbea-439e-b481-03d35dff9a8&title=&width=604)<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1653967435561-5047ce5f-7474-4785-8a1c-2dcab7432074.png#clientId=ufa51db25-0d95-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=355&id=u6cab5ad5&margin=%5Bobject%20Object%5D&name=image.png&originHeight=355&originWidth=720&originalType=binary&ratio=1&rotation=0&showTitle=false&size=21221&status=done&style=none&taskId=u83cb94ef-9a3b-4055-9bd2-d09775c4f61&title=&width=720)

---

<a name="EiiA4"></a>
## 17.6 测试 Ingress HTTPS 代理 tomcat

1. 构建TLS站点
```shell
openssl genrsa -out tls.key 2048

openssl req -new -x509 -key tls.key -out tls.crt -subj /C=CN/ST=Beijing/L=Beijing/O=DevOps/CN=tomcat.lucky.com
  1. 生成secret ```shell kubectl create secret tls tomcat-ingress-secret —cert=tls.crt —key=tls.key

kubectl get secret


3. 创建ingress

Ingress 规则可以参考官方: [https://kubernetes.io/zh/docs/concepts/services-networking/ingress/](https://kubernetes.io/zh/docs/concepts/services-networking/ingress/)
```shell
vim ingress-tomcat-tls.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-tomcat-tls
  namespace: default
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  tls:
  - hosts:
    - tomcat.lucky.com
    secretName: tomcat-ingress-secret
  rules:
  - host: tomcat.lucky.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: tomcat
            port:
              number: 8080
kubectl apply -f ingress-tomcat-tls.yaml
  1. 访问测试

image.png
image.png


十八、 k8s 结合 Prometheus 构建企业级监控告警系统

18.1 深度解读 Prometheus

18.1.1 什么是 Prometheus?

Prometheus 配置:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/
Prometheus 监控组件对应的 exporter 部署地址:
https://prometheus.io/docs/instrumenting/exporters/
Prometheus 基于 k8s 服务发现参考:
https://github.com/prometheus/prometheus/blob/release-2.31/documentation/examples/prometheus-kubernetes.yml
Prometheus 是一个开源的系统监控和报警系统,现在已经加入到 CNCF 基金会,成为继 k8s 之后第二个在 CNCF 托管的项目,在 kubernetes 容器管理系统中,通常会搭配 prometheus 进行监控,同时也支持多种 exporter 采集数据,还支持 pushgateway 进行数据上报,Prometheus 性能足够支撑上万台规模的集群。


18.1.2 prometheus 特点

多维度数据模型
每一个时间序列数据都由 metric 度量指标名称和它的标签 labels 键值对集合唯一确定:
这个 metric 度量指标名称指定监控目标系统的测量特征(如:http_requests_total- 接收 http 请求的总计数)。labels 开启了 Prometheus 的多维数据模型:对于相同的度量名称,通过不同标签列表的结合, 会形成特定的度量维度实例。(例如:所有包含度量名称为/api/tracks 的 http 请求,打上method=POST 的标签,则形成了具体的 http 请求)。这个查询语言在这些度量和标签列表的基础上进行过滤和聚合。改变任何度量上的任何标签值,则会形成新的时间序列图。
2.灵活的查询语言(PromQL)可以对采集的 metrics 指标进行加法,乘法,连接等操作;
3.可以直接在本地部署,不依赖其他分布式存储;
4.通过基于 HTTP 的 pull 方式采集时序数据;
5.可以通过中间网关 pushgateway 的方式把时间序列数据推送到 prometheus server 端;
6.可通过服务发现或者静态配置来发现目标服务对象(targets)。
7.有多种可视化图像界面,如 Grafana 等。
8.高效的存储,每个采样数据占 3.5 bytes 左右,300 万的时间序列,30s 间隔,保留 60 天,消耗磁盘大概 200G。
9.做高可用,可以对数据做异地备份,联邦集群,部署多套 prometheus,pushgateway 上报数据


18.1.3 prometheus 组件

image.png
从上图可发现,Prometheus 整个生态圈组成主要包括 prometheus server,Exporter,pushgateway,alertmanager,grafana,Web ui 界面,Prometheus server 由三个部分组成,Retrieval,Storage,PromQL
(1) Retrieval 负责在活跃的 target 主机上抓取监控指标数据
(2) Storage 存储主要是把采集到的数据存储到磁盘中
(3) PromQL 是 Prometheus 提供的查询语言模块
1.Prometheus Server:
用于收集和存储时间序列数据。
2.Client Library:
客户端库,检测应用程序代码,当 Prometheus 抓取实例的 HTTP 端点时,客户端库会将所有跟踪的 metrics 指标的当前状态发送到 prometheus server 端。
3.Exporters:
prometheus 支持多种 exporter,通过 exporter 可以采集 metrics 数据,然后发送到prometheus server 端,所有向 promtheus server 提供监控数据的程序都可以被称为 exporter
4.Alertmanager:
从 Prometheus server 端接收到 alerts 后,会进行去重,分组,并路由到相应的接收方,发出报警,常见的接收方式有:电子邮件,微信,钉钉, slack 等。
5.Grafana:
监控仪表盘,可视化监控数据
6.pushgateway:
各个目标主机可上报数据到 pushgatewy,然后 prometheus server 统一从 pushgateway 拉取数据。


18.1.4 prometheus 几种部署模式

  1. 基本 HA 模式

image.png
基本的 HA 模式只能确保 Promthues 服务的可用性问题,但是不解决 Prometheus Server 之间的数据一致性问题以及持久化问题(数据丢失后无法恢复),也无法进行动态的扩展。因此这种部署方式适合监控规模不大,Promthues Server 也不会频繁发生迁移的情况,并且只需要保存短周期监控数据的场景。

  1. 基本 HA + 远程存储方案

image.png
在解决了 Promthues 服务可用性的基础上,同时确保了数据的持久化,当 Promthues Server 发生宕机或者数据丢失的情况下,可以快速的恢复。 同时 Promthues Server 可能很好的进行迁移。因此,该方案适用于用户监控规模不大,但是希望能够将监控数据持久化,同时能够确保 Promthues Server 的可迁移性的场景。

  1. 基本 HA + 远程存储 + 联邦集群方案

image.png
Promthues 的性能瓶颈主要在于大量的采集任务,因此用户需要利用 Prometheus 联邦集群的特性,将不同类型的采集任务划分到不同的 Promthues 子服务中,从而实现功能分区。例如一个Promthues Server 负责采集基础设施相关的监控指标,另外一个 Prometheus Server 负责采集应用监控指标。再有上层 Prometheus Server 实现对数据的汇聚。


18.1.5 prometheus 工作流程

  1. Prometheus server 可定期从活跃的(up)目标主机上(target)拉取监控指标数据,目标主机的监控数据可通过配置静态 job 或者服务发现的方式被 prometheus server 采集到,这种方式默认的pull 方式拉取指标;也可通过 pushgateway 把采集的数据上报到 prometheus server 中;还可通过一些组件自带的 exporter 采集相应组件的数据;
    2.Prometheus server 把采集到的监控指标数据保存到本地磁盘或者数据库;
    3.Prometheus 采集的监控指标数据按时间序列存储,通过配置报警规则,把触发的报警发送到alertmanager
    4.Alertmanager 通过配置报警接收方,发送报警到邮件,微信或者钉钉等
    5.Prometheus 自带的 web ui 界面提供 PromQL 查询语言,可查询监控数据
    6.Grafana 可接入 prometheus 数据源,把监控数据以图形化形式展示出

18.1.6 prometheus 如何更好的监控 k8s?

对于 Kubernetes 而言,我们可以把当中所有的资源分为几类:

  1. 基础设施层(Node):集群节点,为整个集群和应用提供运行时资源
  2. 容器基础设施(Container):为应用提供运行时环境
  3. 用户应用(Pod):Pod 中会包含一组容器,它们一起工作,并且对外提供一个(或者一组)功 能
  4. 内部服务负载均衡(Service):在集群内,通过 Service 在集群暴露应用功能,集群内应用和应用之间访问时提供内部的负载均衡
  5. 外部访问入口(Ingress):通过 Ingress 提供集群外的访问入口,从而可以使外部客户端能够访问到部署在 Kubernetes 集群内的服务

因此,在不考虑 Kubernetes 自身组件的情况下,如果要构建一个完整的监控体系,我们应该考虑,以下 5 个方面:
1) 集群节点状态监控:从集群中各节点的 kubelet 服务获取节点的基本运行状态;
2) 集群节点资源用量监控:通过 Daemonset 的形式在集群中各个节点部署 Node Exporter 采集节点的资源使用情况;
3) 节点中运行的容器监控:通过各个节点中 kubelet 内置的 cAdvisor 中获取个节点中所有容器的运行状态和资源使用情况;
4) 从黑盒监控的角度在集群中部署 Blackbox Exporter 探针服务,检测 Service 和 Ingress 的可用性;
5) 如果在集群中部署的应用程序本身内置了对 Prometheus 的监控支持,那么我们还应该找到相应的 Pod 实例,并从该 Pod 实例中获取其内部运行状态的监控指标。


18.1.7 prometheus和zabbix对比

image.png


18.2 安装采集节点资源指标组件 node-exporter

node-exporter 是什么?
采集机器(物理机、虚拟机、云主机等)的监控指标数据,能够采集到的指标包括 CPU, 内存,磁盘,网络,文件数等信息

  1. 解压镜像到每个节点

node-exporter.tar.zip

docker load -i node-exporter.tar.gz
  1. 编写yaml文件 ```shell kubectl create ns monitor-sa

vim node-export.yaml

```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitor-sa
  labels:
    name: node-exporter
spec:
  selector:
    matchLabels:
     name: node-exporter
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: node-exporter
        image: prom/node-exporter:v0.16.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 0.15
        securityContext:
          privileged: true
        args:
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - '"^/(sys|proc|dev|host|etc)($|/)"'
        volumeMounts:
        - name: dev
          mountPath: /host/dev
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: rootfs
          mountPath: /rootfs
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
kubectl apply -f node-export.yaml

kubectl get pods -n monitor-sa

通过 node-exporter 采集数据,node-export 默认的监听端口是 9100,可以看到当前主机获取到的所有监控数据

curl http://主机 ip:9100/metrics

显示 192.168.1.63 主机 cpu 的使用情况

curl http://192.168.1.63:9100/metrics | grep node_cpu_seconds

image.png
#HELP:解释当前指标的含义,上面表示在每种模式下 node 节点的 cpu 花费的时间,以 s 为单位
#TYPE:说明当前指标的数据类型,上面是 counter 类型
node_cpu_seconds_total{cpu=”0”,mode=”idle”} : cpu0 上 idle 进程占用 CPU 的总时间,CPU 占用时间是一个只增不减的度量指标,从类型中也可以看出 node_cpu 的数据类型是 counter(计数器) counter 计数器:只是采集递增的指标

curl http://192.168.1.63:9100/metrics | grep node_load

image.png
node_load1 该指标反映了当前主机在最近一分钟以内的负载情况,系统的负载情况会随系统资源的使用而变化,因此 node_load1 反映的是当前状态,数据可能增加也可能减少,从注释中可以看出当前指标类型为 gauge(标准尺寸)
gauge 标准尺寸:统计的指标可增加可减少
node-exporter 官方网站:https://prometheus.io/docs/guides/node-exporter/
node-exporter 的 github 地址:https://github.com/prometheus/nodeexporter/


18.3 在 k8s 集群中安装 Prometheus server 服务

18.3.1 创建sa账号

#在 k8s 集群的控制节点操作,创建一个 sa 账号
kubectl create serviceaccount monitor -n monitor-sa

#把 sa 账号 monitor 通过 clusterrolebing 绑定到 clusterrole 上
kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor-sa --clusterrole=cluster-admin --serviceaccount=monitor-sa:monitor

#注意:如果执行上面授权也会报错,那就需要下面的授权命令:
kubectl create clusterrolebinding monitor-clusterrolebinding-1 -n monitor-sa --clusterrole=cluster-admin --user=system:serviceaccount:monitor:monitor-sa

18.3.2 创建数据目录

mkdir /data -p

chmod 777 /data/

18.3.3 安装 prometheus 服务

以下步骤均在 k8s 集群的控制节点操作:

  1. 创建一个 configmap 存储卷,用来存放 prometheus 配置信息

Prometheus 配置参考: https://prometheus.io/docs/prometheus/latest/configuration/configuration/
Prometheus 基于 k8s 服务发现参考: https://github.com/prometheus/prometheus/blob/release-2.31/documentation/examples/prometheus-kubernetes.yml

vim prometheus-cfg.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: monitor-sa
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s       
      scrape_timeout: 10s        
      evaluation_interval: 1m     
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    global:
      scrape_interval: 15s              #采集目标主机监控据的时间间隔
      scrape_timeout: 10s               # 数据采集超时时间,默认 10s
      evaluation_interval: 1m           #触发告警检测的时间,默认是 1m
 # 如果我们写了超过 80%的告警,结果收到多条告警,但是真实超过 80%的只有一个时间点。这是另外一个参数影响的  evaluation_interval  这个是触发告警检测的时间,默认为 1m。假如我们的指标是 5m 被拉取一次。检测根据 evaluation_interval 1m 一次,所以在值被更新前,我们一直用的旧值来进行多次判断,造成了 1m 一次,同一个指标被告警了 4 次。
     scrape_configs:         #配置数据源,称为 target,每个 target 用 job_name 命名。又分为静态配置和服务发现
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:         #使用的是 k8s 的服务发现
      - role: node                # 使用 node 角色,它使用默认的 kubelet 提供的 http 端口来发现集群中每个 node 节点。
      relabel_configs:              #重新标记标签
      - source_labels: [__address__]     #配置的原始标签,匹配地址
        regex: '(.*):10250'          #匹配带有 10250 端口的 url: ip:10250
        replacement: '${1}:9100'      # 把匹配到的 ip:10250 的 ip 保留替换成${1}
        target_label: __address__    #新生成的 url 是${1}获取到的 ip:9100
        action: replace              #替换 
      - action: labelmap             #匹配到下面正则表达式的标签会被保留
        regex: __meta_kubernetes_node_label_(.+)      
    - job_name: 'kubernetes-node-cadvisor'      #抓取 cAdvisor 数据,是获取 kubelet 上/metrics/cadvisor 接口数据来获取容器的资源使用情况
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      # ca_file 和 bearer_token 是 promethus 这个容器访问 Apiserver 使用认证
      relabel_configs:
      - action: labelmap        #把匹配到的标签保留
        regex: __meta_kubernetes_node_label_(.+)     #保留匹配到的具有__meta_kubernetes_node_label 的标签
      - target_label: __address__            #获取到的地址:__address__="192.168.1.63:10250"
        replacement: kubernetes.default.svc:443      #把获取到的地址替换成新的地址 kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)         #把原始标签中__meta_kubernetes_node_name 值匹配到
        target_label: __metrics_path__         #获取__metrics_path__对应的值
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
        #把 metrics 替换成新的值 api/v1/nodes/xuegod63/proxy/metrics/cadvisor ${1}是__meta_kubernetes_node_name 获取到的值, url 就是https://kubernetes.default.svc:443/api/v1/nodes/xuegod63/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints     #使用 k8s 中的 endpoint 服务发现,采集 apiserver 6443 端口获取到的数据
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      #endpoint 这个对象的名称空间,endpoint 对象的服务名字,endpoint 的端口名称
        action: keep
        regex: default;kubernetes;https
        #正则匹配到的默认空间下的 service 名字是 kubernetes,协议是 https 的 endpoint 类型保留下来
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] 
        action: keep
        regex: true  # rometheus.io/scrape,为 true 则会将 pod 或者 service 作为监控目标。
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace        # prometheus.io/scheme 默认 http,如果为了安全设置了 https,此处需要改为 httpsprometheus.io/scheme 默认 http,如果为了安全设置了 https,此处需要改为 https
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace     # prometheus.io/path,默认为/metrics
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace        # prometheus.io/port , 端口
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
kubectl apply -f prometheus-cfg.yaml
  1. 部署prometheus-server

prometheus_2.33.5.tar.zip

vim prometheus-deploy.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: monitor-sa
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: k8s-slave1
      serviceAccountName: monitor
      containers:
      - name: prometheus
        image: prom/prometheus:v2.33.5
        imagePullPolicy: IfNotPresent
        command:
          - prometheus
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus      #旧数据存储目录
          - --storage.tsdb.retention=720h      #何时删除旧数据,默认为 15 天。
          - --web.enable-lifecycle       #开启热加载
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus
          name: prometheus-config
        - mountPath: /prometheus/
          name: prometheus-storage-volume
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
        - name: prometheus-storage-volume
          hostPath:
           path: /data
           type: Directory

注意:在上面的 prometheus-deploy.yaml 文件有个 nodeName 字段,这个就是用来指定创建的这个 prometheus 的 pod 调度到哪个节点上,我们这里让 nodeName=xuegod64,也即是让 pod调度到 k8s-slave1 节点上,因为k8s-slave1节点我们创建了数据目录/data,所以记住:你在 k8s集群的哪个节点创建/data,就让 pod 调度到哪个节点。

kubectl apply -f prometheus-deploy.yaml

kubectl get pods -n monitor-sa
  1. 创建prometheus-service
    vim prometheus-svc.yaml
    
    apiVersion: v1
    kind: Service
    metadata:
    name: prometheus
    namespace: monitor-sa
    labels:
     app: prometheus
    spec:
    type: NodePort
    ports:
     - port: 9090
       targetPort: 9090
       protocol: TCP
    selector:
     app: prometheus
     component: server
    
    ```shell kubectl apply -f prometheus-svc.yaml

kubectl get svc -n monitor-sa

![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1654156417220-e371ad9c-8fa0-437e-b9b8-8ab215a1e2e4.png#clientId=ue40b49dd-e690-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=74&id=uf83cead6&margin=%5Bobject%20Object%5D&name=image.png&originHeight=74&originWidth=740&originalType=binary&ratio=1&rotation=0&showTitle=false&size=8353&status=done&style=none&taskId=u2e92a4b9-87fc-42c8-80e2-960679a6f4d&title=&width=740)<br />通过上面可以看到 service 在宿主机上映射的端口是 30009,这样我们访问 k8s 集群的控制节点的 ip:31335,就可以访问到 prometheus 的 web ui 界面了

4. 访问测试

![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1654156486333-83ccde6a-6e74-4aa5-9802-dc4222f3c3e4.png#clientId=ue40b49dd-e690-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=609&id=u3c49b515&margin=%5Bobject%20Object%5D&name=image.png&originHeight=609&originWidth=1193&originalType=binary&ratio=1&rotation=0&showTitle=false&size=40721&status=done&style=none&taskId=uf8acffd6-3653-4ae8-bfcc-3113ea70745&title=&width=1193)
<a name="h2iO5"></a>
### 18.3.4 安装和配置可视化 UI 界面 Grafana

1. 安装 Grafana 需要的镜像 docker load -i grafana_8.4.5.tar.gz 在课件里,把镜像上传到 k8s 的各个控制节点和 k8s 的各个工作节点,然后在各个节点手动解压

[grafana_8.4.5.tar.zip](https://www.yuque.com/attachments/yuque/0/2022/zip/12849133/1657079484811-eb7a7fb7-adeb-4063-91aa-a5de6de9824b.zip?_lake_card=%7B%22src%22%3A%22https%3A%2F%2Fwww.yuque.com%2Fattachments%2Fyuque%2F0%2F2022%2Fzip%2F12849133%2F1657079484811-eb7a7fb7-adeb-4063-91aa-a5de6de9824b.zip%22%2C%22name%22%3A%22grafana_8.4.5.tar.zip%22%2C%22size%22%3A90581316%2C%22type%22%3A%22application%2Fx-zip-compressed%22%2C%22ext%22%3A%22zip%22%2C%22source%22%3A%22%22%2C%22status%22%3A%22done%22%2C%22download%22%3Afalse%2C%22taskId%22%3A%22ub9976721-a740-4572-8fc3-aa27848466c%22%2C%22taskType%22%3A%22transfer%22%2C%22id%22%3A%22ue67f086e%22%2C%22card%22%3A%22file%22%7D)
```shell
docker load -i grafana_8.4.5.tar.gz

mkdir /var/lib/grafana/ -p

chmod 777 /var/lib/grafana/

vim grafana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: monitoring-grafana
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      task: monitoring
      k8s-app: grafana
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      nodeName: k8s-slave1
      containers:
      - name: grafana
        image: grafana/grafana:8.4.5
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: ca-certificates
          readOnly: true
        - mountPath: /var
          name: grafana-storage
        - mountPath: /var/lib/grafana/
          name: lib
        env:
        - name: INFLUXDB_HOST
          value: monitoring-influxdb
        - name: GF_SERVER_HTTP_PORT
          value: "3000"
          # The following env variables are required to make Grafana accessible via
          # the kubernetes api-server proxy. On production clusters, we recommend
          # removing these env variables, setup auth for grafana, and expose the grafana
          # service using a LoadBalancer or a public IP.
        - name: GF_AUTH_BASIC_ENABLED
          value: "false"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ORG_ROLE
          value: Admin
        - name: GF_SERVER_ROOT_URL
          # If you're only using the API Server proxy, set this value instead:
          # value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
          value: /
      volumes:
      - name: ca-certificates
        hostPath:
          path: /etc/ssl/certs
      - name: grafana-storage
        emptyDir: {}
      - name: lib
        hostPath:
         path: /var/lib/grafana/
         type: DirectoryOrCreate
---
apiVersion: v1
kind: Service
metadata:
  labels:
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-grafana
  name: monitoring-grafana
  namespace: kube-system
spec:
  # In a production setup, we recommend accessing Grafana through an external Loadbalancer
  # or through a public IP.
  # type: LoadBalancer
  # You could also use NodePort to expose the service at a randomly-generated port
  # type: NodePort
  ports:
  - port: 80
    targetPort: 3000
  selector:
    k8s-app: grafana
  type: NodePort
kubectl apply -f grafana.yaml

kubectl get pods -n kube-system| grep monitor

kubectl get svc -n kube-system | grep grafana
  1. 登录访问

image.png

  1. 配置grafana

选择 Add your first data source
image.png
出现如下
image.pngName: Prometheus
HTTP 处的 URL 如下:
http://prometheus.monitor-sa.svc:9090
配置好的整体页面如下:
image.png
image.png

点击右下角 Save & Test,出现如下 Data source is working,说明 prometheus 数据源成功的被 grafana 接入了
image.png
导入监控模板,可在如下链接搜索
https://grafana.com/dashboards?dataSource=prometheus&search=kubernetes
上面 Save & Test 测试没问题之后,就可以返回 Grafana 主页面
image.png
image.png
image.png
image.png
node_exporter.json
选择一个本地的 json 文件,我们选择的是上面让大家下载的 node_exporter.json 这个文件,选择之后出现如下:
image.png
注:箭头标注的地方 Name 后面的名字是 node_exporter.json 定义的
Prometheus 后面需要变成 Prometheus,然后再点击 Import,就可以出现如下界面:
image.png
导入 docker_rev1.json 监控模板,步骤和上面导入 node_exporter.json 步骤一样,导入之后显示如下:
docker_rev1.json
image.png
image.png


18.4 kube-state-metrics 组件解读

18.4.1 什么是 kube-state-metrics?

kube-state-metrics 通过监听 API Server 生成有关资源对象的状态指标,比如 Deployment、Node、Pod,需要注意的是 kube-state-metrics 只是简单的提供一个 metrics 数据,并不会存储这些指标数据,所以我们可以使用 Prometheus 来抓取这些数据然后存储,主要关注的是业务相关的一些元数据,比如 Deployment、Pod、副本状态等;调度了多少个 replicas?现在可用的有几个?多少个Pod 是 running/stopped/terminated 状态?Pod 重启了多少次?我有多少 job 在运行中。


18.4.2 安装和配置 kube-state-metrics

  1. 创建 sa,并对 sa 授权在 k8s 的控制节点生成一个 kube-state-metrics-rbac.yaml 文件
    vim kube-state-metrics-rbac.yaml
    
    ```yaml

apiVersion: v1 kind: ServiceAccount metadata: name: kube-state-metrics

namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-state-metrics rules:

  • apiGroups: [“”] resources: [“nodes”, “pods”, “services”, “resourcequotas”, “replicationcontrollers”, “limitranges”, “persistentvolumeclaims”, “persistentvolumes”, “namespaces”, “endpoints”] verbs: [“list”, “watch”]
  • apiGroups: [“extensions”] resources: [“daemonsets”, “deployments”, “replicasets”] verbs: [“list”, “watch”]
  • apiGroups: [“apps”] resources: [“statefulsets”] verbs: [“list”, “watch”]
  • apiGroups: [“batch”] resources: [“cronjobs”, “jobs”] verbs: [“list”, “watch”]
  • apiGroups: [“autoscaling”] resources: [“horizontalpodautoscalers”] verbs: [“list”, “watch”]

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-state-metrics roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-state-metrics subjects:

  • kind: ServiceAccount name: kube-state-metrics namespace: kube-system
    ```shell
    kubectl apply -f kube-state-metrics-rbac.yaml
    
  1. 安装 kube-state-metrics 组件 ```shell docker load -i kube-state-metrics_1_9_0.tar.gz

vim kube-state-metrics-deploy.yaml

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: quay.io/coreos/kube-state-metrics:v1.9.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
 kubectl apply -f kube-state-metrics-deploy.yaml

  kubectl get pods -n kube-system -l app=kube-state-metrics
  1. 创建service

    vim kube-state-metrics-svc.yaml
    
    apiVersion: v1
    kind: Service
    metadata:
    annotations:
     prometheus.io/scrape: 'true'
    name: kube-state-metrics
    namespace: kube-system
    labels:
     app: kube-state-metrics
    spec:
    ports:
    - name: kube-state-metrics
     port: 8080
     protocol: TCP
    selector:
     app: kube-state-metrics
    
    kubectl apply -f kube-state-metrics-svc.yaml
    
    kubectl get svc -n kube-system | grep kube-state-metrics
    
  2. 在grafana中导入json

Kubernetes Cluster (Prometheus)-1577674936972.json-1577674936972.json%22%2C%22size%22%3A52212%2C%22type%22%3A%22application%2Fjson%22%2C%22ext%22%3A%22json%22%2C%22source%22%3A%22%22%2C%22status%22%3A%22done%22%2C%22download%22%3Afalse%2C%22taskId%22%3A%22ub69c594d-433c-4d87-b1c3-b8552774275%22%2C%22taskType%22%3A%22transfer%22%2C%22id%22%3A%22u27d03e30%22%2C%22card%22%3A%22file%22%7D)
导入 Kubernetes Cluster (Prometheus)-1577674936972.json 之后出现如下页面
image.pngKubernetes cluster monitoring (via Prometheus) (k8s 1.16)-1577691996738.json%20(k8s%201.16)-1577691996738.json%22%2C%22size%22%3A54166%2C%22type%22%3A%22application%2Fjson%22%2C%22ext%22%3A%22json%22%2C%22source%22%3A%22%22%2C%22status%22%3A%22done%22%2C%22download%22%3Afalse%2C%22taskId%22%3A%22u92a88adc-809c-4ba0-befc-9beb86b6630%22%2C%22taskType%22%3A%22transfer%22%2C%22id%22%3A%22u4d89308b%22%2C%22card%22%3A%22file%22%7D)
在 grafana web 界面导入 Kubernetes cluster monitoring (via Prometheus) (k8s 1.16)-1577691996738.json,出现如下页面
image.png


18.5 安装和配置 Alertmanager-发送报警到 qq 邮箱

  1. 创建configmap

    vim alertmanager-cm.yaml
    
    kind: ConfigMap
    apiVersion: v1
    metadata:
    name: alertmanager
    namespace: monitor-sa
    data:
    alertmanager.yml: |-
     global:
       resolve_timeout: 1m
       smtp_smarthost: 'smtp.163.com:25'
       smtp_from: '1814553****@163.com'
       smtp_auth_username: '1814553****'
       smtp_auth_password: 'LAYXLXRZGFUBWOMZ'
       smtp_require_tls: false
     route:
       group_by: [alertname]
       group_wait: 10s
       group_interval: 10s
       repeat_interval: 10m
       receiver: default-receiver
     receivers:
     - name: 'default-receiver'
       email_configs:
       - to: '1639373787@qq.com'
         send_resolved: true
    
    kind: ConfigMap
    apiVersion: v1
    metadata:
    name: alertmanager
    namespace: monitor-sa
    data:
    alertmanager.yml: |-
     global:
       resolve_timeout: 1m
       smtp_smarthost: 'smtp.163.com:25'        #用于发送邮件的邮箱的 SMTP 服务器地址+端口
       smtp_from: '1814553****@163.com'          #这是指定从哪个邮箱发送报警
       smtp_auth_username: '1814553****'            #这是发送邮箱的认证用户,不是邮箱名
       smtp_auth_password: 'LAYXLXRZGFUBWOMZ'         #这是发送邮箱的授权码而不是登录密码
       smtp_require_tls: false
     route:                     #用于配置告警分发策略
       group_by: [alertname]       # 采用哪个标签来作为分组依据
       group_wait: 10s          # 组告警等待时间。也就是告警产生后等待 10s,如果有同组告警一起发出
       group_interval: 10s       # 两组告警的间隔时间
       repeat_interval: 10m      # 重复告警的间隔时间,减少相同邮件的发送频率
       receiver: default-receiver        # 设置默认接收人
     receivers:
     - name: 'default-receiver'
       email_configs:
       - to: '1639373787@qq.com'        #to 后面指定发送到哪个邮箱,不应该跟smtp_from 的邮箱名字重复
         send_resolved: true
    
    kubectl apply -f alertmanager-cm.yaml
    
  2. 创建第二个配置文件

    vim prometheus-alertmanager-cfg.yaml
    
    kind: ConfigMap
    apiVersion: v1
    metadata:
    labels:
     app: prometheus
    name: prometheus-config
    namespace: monitor-sa
    data:
    prometheus.yml: |
     rule_files:
     - /etc/prometheus/rules.yml
     alerting:
       alertmanagers:
       - static_configs:
         - targets: ["localhost:9093"]
     global:
       scrape_interval: 15s
       scrape_timeout: 10s
       evaluation_interval: 1m
     scrape_configs:
     - job_name: 'kubernetes-node'
       kubernetes_sd_configs:
       - role: node
       relabel_configs:
       - source_labels: [__address__]
         regex: '(.*):10250'
         replacement: '${1}:9100'
         target_label: __address__
         action: replace
       - action: labelmap
         regex: __meta_kubernetes_node_label_(.+)
     - job_name: 'kubernetes-node-cadvisor'
       kubernetes_sd_configs:
       - role:  node
       scheme: https
       tls_config:
         ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
       bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
       relabel_configs:
       - action: labelmap
         regex: __meta_kubernetes_node_label_(.+)
       - target_label: __address__
         replacement: kubernetes.default.svc:443
       - source_labels: [__meta_kubernetes_node_name]
         regex: (.+)
         target_label: __metrics_path__
         replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
     - job_name: 'kubernetes-apiserver'
       kubernetes_sd_configs:
       - role: endpoints
       scheme: https
       tls_config:
         ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
       bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
       relabel_configs:
       - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
         action: keep
         regex: default;kubernetes;https
     - job_name: 'kubernetes-service-endpoints'
       kubernetes_sd_configs:
       - role: endpoints
       relabel_configs:
       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
         action: keep
         regex: true
       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
         action: replace
         target_label: __scheme__
         regex: (https?)
       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
         action: replace
         target_label: __metrics_path__
         regex: (.+)
       - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
         action: replace
         target_label: __address__
         regex: ([^:]+)(?::\d+)?;(\d+)
         replacement: $1:$2
       - action: labelmap
         regex: __meta_kubernetes_service_label_(.+)
       - source_labels: [__meta_kubernetes_namespace]
         action: replace
         target_label: kubernetes_namespace
       - source_labels: [__meta_kubernetes_service_name]
         action: replace
         target_label: kubernetes_name 
     - job_name: kubernetes-pods
       kubernetes_sd_configs:
       - role: pod
       relabel_configs:
       - action: keep
         regex: true
         source_labels:
         - __meta_kubernetes_pod_annotation_prometheus_io_scrape
       - action: replace
         regex: (.+)
         source_labels:
         - __meta_kubernetes_pod_annotation_prometheus_io_path
         target_label: __metrics_path__
       - action: replace
         regex: ([^:]+)(?::\d+)?;(\d+)
         replacement: $1:$2
         source_labels:
         - __address__
         - __meta_kubernetes_pod_annotation_prometheus_io_port
         target_label: __address__
       - action: labelmap
         regex: __meta_kubernetes_pod_label_(.+)
       - action: replace
         source_labels:
         - __meta_kubernetes_namespace
         target_label: kubernetes_namespace
       - action: replace
         source_labels:
         - __meta_kubernetes_pod_name
         target_label: kubernetes_pod_name
     - job_name: 'kubernetes-schedule'
       scrape_interval: 5s
       static_configs:
       - targets: ['192.168.1.63:10251','192.168.1.64:10251']       # 如果是多主就写2个ip和端口
     - job_name: 'kubernetes-controller-manager'
       scrape_interval: 5s
       static_configs:
       - targets: ['192.168.1.63:10252','192.168.1.64:10251']
     - job_name: 'kubernetes-kube-proxy'
       scrape_interval: 5s
       static_configs: 
       - targets: ['192.168.1.65:10249','192.168.1.66:10249']        # 有哪些节点就写哪些节点,可以用kubectl查看
     - job_name: 'kubernetes-etcd'
       scheme: https
       tls_config:
         ca_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/ca.crt
         cert_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/server.crt
         key_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/server.key
       scrape_interval: 5s
       static_configs:
       - targets: ['192.168.1.63:2379','192.168.1.64:2379']
    rules.yml: |
     groups:
     - name: example
       rules:
       - alert: kube-proxy的cpu使用率大于80%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-kube-proxy"}[1m]) * 100 > 80
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过80%"
       - alert:  kube-proxy的cpu使用率大于90%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-kube-proxy"}[1m]) * 100 > 90
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过90%"
       - alert: scheduler的cpu使用率大于80%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-schedule"}[1m]) * 100 > 80
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过80%"
       - alert:  scheduler的cpu使用率大于90%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-schedule"}[1m]) * 100 > 90
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过90%"
       - alert: controller-manager的cpu使用率大于80%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-controller-manager"}[1m]) * 100 > 80
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过80%"
       - alert:  controller-manager的cpu使用率大于90%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-controller-manager"}[1m]) * 100 > 0
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过90%"
       - alert: apiserver的cpu使用率大于80%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-apiserver"}[1m]) * 100 > 80
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过80%"
       - alert:  apiserver的cpu使用率大于90%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-apiserver"}[1m]) * 100 > 90
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过90%"
       - alert: etcd的cpu使用率大于80%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-etcd"}[1m]) * 100 > 80
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过80%"
       - alert:  etcd的cpu使用率大于90%
         expr: rate(process_cpu_seconds_total{job=~"kubernetes-etcd"}[1m]) * 100 > 90
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}组件的cpu使用率超过90%"
       - alert: kube-state-metrics的cpu使用率大于80%
         expr: rate(process_cpu_seconds_total{k8s_app=~"kube-state-metrics"}[1m]) * 100 > 80
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.k8s_app}}组件的cpu使用率超过80%"
           value: "{{ $value }}%"
           threshold: "80%"      
       - alert: kube-state-metrics的cpu使用率大于90%
         expr: rate(process_cpu_seconds_total{k8s_app=~"kube-state-metrics"}[1m]) * 100 > 0
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.k8s_app}}组件的cpu使用率超过90%"
           value: "{{ $value }}%"
           threshold: "90%"      
       - alert: coredns的cpu使用率大于80%
         expr: rate(process_cpu_seconds_total{k8s_app=~"kube-dns"}[1m]) * 100 > 80
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.k8s_app}}组件的cpu使用率超过80%"
           value: "{{ $value }}%"
           threshold: "80%"      
       - alert: coredns的cpu使用率大于90%
         expr: rate(process_cpu_seconds_total{k8s_app=~"kube-dns"}[1m]) * 100 > 90
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.k8s_app}}组件的cpu使用率超过90%"
           value: "{{ $value }}%"
           threshold: "90%"      
       - alert: kube-proxy打开句柄数>600
         expr: process_open_fds{job=~"kubernetes-kube-proxy"}  > 600
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>600"
           value: "{{ $value }}"
       - alert: kube-proxy打开句柄数>1000
         expr: process_open_fds{job=~"kubernetes-kube-proxy"}  > 1000
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>1000"
           value: "{{ $value }}"
       - alert: kubernetes-schedule打开句柄数>600
         expr: process_open_fds{job=~"kubernetes-schedule"}  > 600
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>600"
           value: "{{ $value }}"
       - alert: kubernetes-schedule打开句柄数>1000
         expr: process_open_fds{job=~"kubernetes-schedule"}  > 1000
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>1000"
           value: "{{ $value }}"
       - alert: kubernetes-controller-manager打开句柄数>600
         expr: process_open_fds{job=~"kubernetes-controller-manager"}  > 600
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>600"
           value: "{{ $value }}"
       - alert: kubernetes-controller-manager打开句柄数>1000
         expr: process_open_fds{job=~"kubernetes-controller-manager"}  > 1000
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>1000"
           value: "{{ $value }}"
       - alert: kubernetes-apiserver打开句柄数>600
         expr: process_open_fds{job=~"kubernetes-apiserver"}  > 600
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>600"
           value: "{{ $value }}"
       - alert: kubernetes-apiserver打开句柄数>1000
         expr: process_open_fds{job=~"kubernetes-apiserver"}  > 1000
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>1000"
           value: "{{ $value }}"
       - alert: kubernetes-etcd打开句柄数>600
         expr: process_open_fds{job=~"kubernetes-etcd"}  > 600
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>600"
           value: "{{ $value }}"
       - alert: kubernetes-etcd打开句柄数>1000
         expr: process_open_fds{job=~"kubernetes-etcd"}  > 1000
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "{{$labels.instance}}的{{$labels.job}}打开句柄数>1000"
           value: "{{ $value }}"
       - alert: coredns
         expr: process_open_fds{k8s_app=~"kube-dns"}  > 600
         for: 2s
         labels:
           severity: warnning 
         annotations:
           description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 打开句柄数超过600"
           value: "{{ $value }}"
       - alert: coredns
         expr: process_open_fds{k8s_app=~"kube-dns"}  > 1000
         for: 2s
         labels:
           severity: critical
         annotations:
           description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 打开句柄数超过1000"
           value: "{{ $value }}"
       - alert: kube-proxy
         expr: process_virtual_memory_bytes{job=~"kubernetes-kube-proxy"}  > 2000000000
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): 使用虚拟内存超过2G"
           value: "{{ $value }}"
       - alert: scheduler
         expr: process_virtual_memory_bytes{job=~"kubernetes-schedule"}  > 2000000000
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): 使用虚拟内存超过2G"
           value: "{{ $value }}"
       - alert: kubernetes-controller-manager
         expr: process_virtual_memory_bytes{job=~"kubernetes-controller-manager"}  > 2000000000
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): 使用虚拟内存超过2G"
           value: "{{ $value }}"
       - alert: kubernetes-apiserver
         expr: process_virtual_memory_bytes{job=~"kubernetes-apiserver"}  > 2000000000
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): 使用虚拟内存超过2G"
           value: "{{ $value }}"
       - alert: kubernetes-etcd
         expr: process_virtual_memory_bytes{job=~"kubernetes-etcd"}  > 2000000000
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): 使用虚拟内存超过2G"
           value: "{{ $value }}"
       - alert: kube-dns
         expr: process_virtual_memory_bytes{k8s_app=~"kube-dns"}  > 2000000000
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 使用虚拟内存超过2G"
           value: "{{ $value }}"
       - alert: HttpRequestsAvg
         expr: sum(rate(rest_client_requests_total{job=~"kubernetes-kube-proxy|kubernetes-kubelet|kubernetes-schedule|kubernetes-control-manager|kubernetes-apiservers"}[1m]))  > 1000
         for: 2s
         labels:
           team: admin
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): TPS超过1000"
           value: "{{ $value }}"
           threshold: "1000"   
       - alert: Pod_restarts
         expr: kube_pod_container_status_restarts_total{namespace=~"kube-system|default|monitor-sa"} > 0
         for: 2s
         labels:
           severity: warnning
         annotations:
           description: "在{{$labels.namespace}}名称空间下发现{{$labels.pod}}这个pod下的容器{{$labels.container}}被重启,这个监控指标是由{{$labels.instance}}采集的"
           value: "{{ $value }}"
           threshold: "0"
       - alert: Pod_waiting
         expr: kube_pod_container_status_waiting_reason{namespace=~"kube-system|default"} == 1
         for: 2s
         labels:
           team: admin
         annotations:
           description: "空间{{$labels.namespace}}({{$labels.instance}}): 发现{{$labels.pod}}下的{{$labels.container}}启动异常等待中"
           value: "{{ $value }}"
           threshold: "1"   
       - alert: Pod_terminated
         expr: kube_pod_container_status_terminated_reason{namespace=~"kube-system|default|monitor-sa"} == 1
         for: 2s
         labels:
           team: admin
         annotations:
           description: "空间{{$labels.namespace}}({{$labels.instance}}): 发现{{$labels.pod}}下的{{$labels.container}}被删除"
           value: "{{ $value }}"
           threshold: "1"
       - alert: Etcd_leader
         expr: etcd_server_has_leader{job="kubernetes-etcd"} == 0
         for: 2s
         labels:
           team: admin
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): 当前没有leader"
           value: "{{ $value }}"
           threshold: "0"
       - alert: Etcd_leader_changes
         expr: rate(etcd_server_leader_changes_seen_total{job="kubernetes-etcd"}[1m]) > 0
         for: 2s
         labels:
           team: admin
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): 当前leader已发生改变"
           value: "{{ $value }}"
           threshold: "0"
       - alert: Etcd_failed
         expr: rate(etcd_server_proposals_failed_total{job="kubernetes-etcd"}[1m]) > 0
         for: 2s
         labels:
           team: admin
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}): 服务失败"
           value: "{{ $value }}"
           threshold: "0"
       - alert: Etcd_db_total_size
         expr: etcd_debugging_mvcc_db_total_size_in_bytes{job="kubernetes-etcd"} > 10000000000
         for: 2s
         labels:
           team: admin
         annotations:
           description: "组件{{$labels.job}}({{$labels.instance}}):db空间超过10G"
           value: "{{ $value }}"
           threshold: "10G"
       - alert: Endpoint_ready
         expr: kube_endpoint_address_not_ready{namespace=~"kube-system|default"} == 1
         for: 2s
         labels:
           team: admin
         annotations:
           description: "空间{{$labels.namespace}}({{$labels.instance}}): 发现{{$labels.endpoint}}不可用"
           value: "{{ $value }}"
           threshold: "1"
     - name: 物理节点状态-监控告警
       rules:
       - alert: 物理节点cpu使用率
         expr: 100-avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)*100 > 90
         for: 2s
         labels:
           severity: ccritical
         annotations:
           summary: "{{ $labels.instance }}cpu使用率过高"
           description: "{{ $labels.instance }}的cpu使用率超过90%,当前使用率[{{ $value }}],需要排查处理" 
       - alert: 物理节点内存使用率
         expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 90
         for: 2s
         labels:
           severity: critical
         annotations:
           summary: "{{ $labels.instance }}内存使用率过高"
           description: "{{ $labels.instance }}的内存使用率超过90%,当前使用率[{{ $value }}],需要排查处理"
       - alert: InstanceDown
         expr: up == 0
         for: 2s
         labels:
           severity: critical
         annotations:   
           summary: "{{ $labels.instance }}: 服务器宕机"
           description: "{{ $labels.instance }}: 服务器延时超过2分钟"
       - alert: 物理节点磁盘的IO性能
         expr: 100-(avg(irate(node_disk_io_time_seconds_total[1m])) by(instance)* 100) < 60
         for: 2s
         labels:
           severity: critical
         annotations:
           summary: "{{$labels.mountpoint}} 流入磁盘IO使用率过高!"
           description: "{{$labels.mountpoint }} 流入磁盘IO大于60%(目前使用:{{$value}})"
       - alert: 入网流量带宽
         expr: ((sum(rate (node_network_receive_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 102400
         for: 2s
         labels:
           severity: critical
         annotations:
           summary: "{{$labels.mountpoint}} 流入网络带宽过高!"
           description: "{{$labels.mountpoint }}流入网络带宽持续5分钟高于100M. RX带宽使用率{{$value}}"
       - alert: 出网流量带宽
         expr: ((sum(rate (node_network_transmit_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 102400
         for: 2s
         labels:
           severity: critical
         annotations:
           summary: "{{$labels.mountpoint}} 流出网络带宽过高!"
           description: "{{$labels.mountpoint }}流出网络带宽持续5分钟高于100M. RX带宽使用率{{$value}}"
       - alert: TCP会话
         expr: node_netstat_Tcp_CurrEstab > 1000
         for: 2s
         labels:
           severity: critical
         annotations:
           summary: "{{$labels.mountpoint}} TCP_ESTABLISHED过高!"
           description: "{{$labels.mountpoint }} TCP_ESTABLISHED大于1000%(目前使用:{{$value}}%)"
       - alert: 磁盘容量
         expr: 100-(node_filesystem_free_bytes{fstype=~"ext4|xfs"}/node_filesystem_size_bytes {fstype=~"ext4|xfs"}*100) > 80
         for: 2s
         labels:
           severity: critical
         annotations:
           summary: "{{$labels.mountpoint}} 磁盘分区使用率过高!"
           description: "{{$labels.mountpoint }} 磁盘分区使用大于80%(目前使用:{{$value}}%)"
    
    kubectl delete -f prometheus-cfg.yaml  # 删除之前的配置文件,如果没有可以忽略
    
    kubectl apply -f prometheus-alertmanager-cfg.yaml
    
  3. 安装 prometheus 和 alertmanager

alertmanager.tar.zip

docker load -i alertmanager.tar.gz                       # 上传到各个节点执行

vim prometheus-alertmanager-deploy.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: monitor-sa
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: k8s-salve1
      serviceAccountName: monitor
      containers:
      - name: prometheus
        image: prom/prometheus:v2.2.1
        imagePullPolicy: IfNotPresent
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/prometheus"
        - "--storage.tsdb.retention=24h"
        - "--web.enable-lifecycle"
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus
          name: prometheus-config
        - mountPath: /prometheus/
          name: prometheus-storage-volume
        - name: k8s-certs
          mountPath: /var/run/secrets/kubernetes.io/k8s-certs/etcd/
      - name: alertmanager
        image: prom/alertmanager:v0.14.0
        imagePullPolicy: IfNotPresent
        args:
        - "--config.file=/etc/alertmanager/alertmanager.yml"
        - "--log.level=debug"
        ports:
        - containerPort: 9093
          protocol: TCP
          name: alertmanager
        volumeMounts:
        - name: alertmanager-config
          mountPath: /etc/alertmanager
        - name: alertmanager-storage
          mountPath: /alertmanager
        - name: localtime
          mountPath: /etc/localtime
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
        - name: prometheus-storage-volume
          hostPath:
           path: /data
           type: Directory
        - name: k8s-certs
          secret:
           secretName: etcd-certs
        - name: alertmanager-config
          configMap:
            name: alertmanager
        - name: alertmanager-storage
          hostPath:
           path: /data/alertmanager
           type: DirectoryOrCreate
        - name: localtime
          hostPath:
           path: /usr/share/zoneinfo/Asia/Shanghai

注意: 配置文件指定了 nodeName: xuegod64,这个位置要写你自己环境的 node 节点名字

 kubectl -n monitor-sa create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/server.key --from-file=/etc/kubernetes/pki/etcd/server.crt --from-file=/etc/kubernetes/pki/etcd/ca.crt
kubectl delete -f prometheus-deploy.yaml                 # 删除之前创建的pod,没有可以忽略

kubectl apply -f prometheus-alertmanager-deploy.yaml

kubectl get pods -n monitor-sa | grep prometheus
  1. 创建svc
    vim alertmanager-svc.yaml
    
    ```yaml

apiVersion: v1 kind: Service metadata: labels: name: prometheus kubernetes.io/cluster-service: ‘true’ name: alertmanager namespace: monitor-sa spec: ports:

  • name: alertmanager nodePort: 30066 port: 9093 protocol: TCP targetPort: 9093 selector: app: prometheus sessionAffinity: None type: NodePort shell kubectl apply -f alertmanager-svc.yaml

kubectl get svc -n monitor-sa


5. 访问svc暴露的端口

访问 prometheus 的 web 界面点击 status->targets,可看到如下<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1654583812202-db716708-5d6c-4867-ac9f-49e7f5108fb6.png#clientId=u0bafe776-8ada-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=687&id=u647d1520&margin=%5Bobject%20Object%5D&name=image.png&originHeight=687&originWidth=1015&originalType=binary&ratio=1&rotation=0&showTitle=false&size=74238&status=done&style=none&taskId=uafc3eb42-1c0a-4ad9-a43a-990f1ae309f&title=&width=1015)<br />从上面可以发现 kubernetes-controller-manager 和 kubernetes-schedule 都显示连接不上对应的端口<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1654583841120-c09ece87-9355-4e34-bc28-591eec9e565d.png#clientId=u0bafe776-8ada-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=156&id=u0d4e5cb9&margin=%5Bobject%20Object%5D&name=image.png&originHeight=156&originWidth=1416&originalType=binary&ratio=1&rotation=0&showTitle=false&size=36474&status=done&style=none&taskId=u5e88dc56-b189-4e94-affb-35f69e7ff94&title=&width=1416)<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/27422002/1654583853621-10901dd2-1882-45f3-9859-6548ee48040e.png#clientId=u0bafe776-8ada-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=147&id=u03eed7cc&margin=%5Bobject%20Object%5D&name=image.png&originHeight=147&originWidth=1398&originalType=binary&ratio=1&rotation=0&showTitle=false&size=34775&status=done&style=none&taskId=ub51cdf87-c3ad-45ea-91ec-2693b9a34a5&title=&width=1398)<br />可按如下方法处理; 
```shell
vim /etc/kubernetes/manifests/kube-scheduler.yaml

修改如下内容:
把—bind-address=127.0.0.1 变成—bind-address=192.168.1.63
把 httpGet:字段下的 hosts 由 127.0.0.1 变成 192.168.1.63
把—port=0 删除
#注意:
192.168.1.63 是 k8s 的控制节点 xuegod63 节点 ip

vim /etc/kubernetes/manifests/kube-controller-manager.yaml

把—bind-address=127.0.0.1 变成—bind-address=192.168.1.63
把 httpGet:字段下的 hosts 由 127.0.0.1 变成 192.168.1.63
把—port=0 删除
修改之后在 k8s 各个节点执行

systemctl restart kubelet

kubectl get cs

显示如下:
image.png

ss -antulp | grep :10257

image.png

ss -antulp | grep :10259

image.png
可以看到相应的端口已经被物理机监听了
点击 status->targets,可看到如下
image.png
kubernetes-kube-proxy 显示如下:
image.png
是因为 kube-proxy 默认端口 10249 是监听在 127.0.0.1 上的,需要改成监听到物理节点上,按如下方法修改,线上建议在安装 k8s 的时候就做修改,这样风险小一些:

kubectl edit configmap kube-proxy -n kube-system

把 metricsBindAddress 这段修改成 metricsBindAddress: 0.0.0.0:10249

 kubectl get pods -n kube-system | grep kube-proxy |awk '{print $1}' | xargs kubectl delete pods -n kube-system

  ss -antulp |grep :10249

点击 Alerts,可看到如下
image.png
把 kubernetes-etcd 展开,可看到如下:
image.png
FIRING 表示 prometheus 已经将告警发给 alertmanager,在 Alertmanager 中可以看到有一个 alert。
登录到 alertmanager web 界面,显示如下
image.png
这样我在我的 qq 邮箱, 就可以收到报警了,如下
image.png
修改 prometheus 任何一个配置文件之后,可通过 kubectl apply 使配置生效,执行顺序如下:

kubectl delete -f alertmanager-cm.yaml

kubectl apply -f alertmanager-cm.yaml

kubectl delete -f prometheus-alertmanager-cfg.yaml

kubectl apply -f prometheus-alertmanager-cfg.yaml 

kubectl delete -f prometheus-alertmanager-deploy.yaml

kubectl apply -f prometheus-alertmanager-deploy.yaml

18.6 发送报警到微信

  1. 注册企业微信

登陆网址:
https://work.weixin.qq.com/
找到应用管理,创建应用
应用名字 wechat
创建成功之后显示如下:
image.png
AgentId:*003
Secret:**Oj6dD4Jg9qaMu1TTaDzVTCrXHcjlFs

  1. AgentId 和 Secret 用自己企业微信的
    vim alertmanager-cm.yaml
    
    kind: ConfigMap
    apiVersion: v1
    metadata:
    name: alertmanager
    namespace: monitor-sa
    data:
    alertmanager.yml: |-
     global:
       resolve_timeout: 1m
       smtp_smarthost: 'smtp.163.com:25'
       smtp_from: '1814553****@163.com'
       smtp_auth_username: '1814553****'
       smtp_auth_password: 'LAYXLXRZGFUBWOMZ'
       smtp_require_tls: false
     route:
       group_by: [alertname]
       group_wait: 10s
       group_interval: 10s
       repeat_interval: 10m
       receiver: prometheus
     receivers:
     - name: 'prometheus'
       wechat_configs:
       - corp_id: "wwa82df90a693*****"
         to_user: '@all'
         agent_id: 1000005
         api_secret: "xPte8Jw6g1PC_Hkla8Nkshr-3m44gBqPvDWZ*******"
    
    参数说明:
    secret: 企业微信(“企业应用”—>”自定应用”[Prometheus]—> “Secret”)
    wechat 是本人自创建应用名称
    corp_id: 企业信息(“我的企业”—->”CorpID”[在底部])
    agent_id: 企业微信(“企业应用”—>”自定应用”[Prometheus]—> “AgentId”)
    wechat 是自创建应用名称 #在这创建的应用名字是 wechat,那么在配置 route 时,receiver 也
    应该是 Prometheus
    to_user: ‘@all’ :发送报警到所有人
    修改 prometheus 任何一个配置文件之后,可通过 kubectl apply 使配置生效,执行顺序如下: ```shell kubectl delete -f alertmanager-cm.yaml

kubectl apply -f alertmanager-cm.yaml

kubectl delete -f prometheus-alertmanager-cfg.yaml

kubectl apply -f prometheus-alertmanager-cfg.yaml

kubectl delete -f prometheus-alertmanager-deploy.yaml

kubectl apply -f prometheus-alertmanager-deploy.yaml


3. 配置自定义告警模板
```shell
vim template_wechat.tmpl
{{ define "wechat.default.message" }}
{{ range .Alerts }}
========start==========
告警程序:node_exporter
告警名称:{{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警信息: {{ .Annotations.description }}
========end==========
{{ end }}
{{ end }}

18.7 发送报警到微信钉钉

  1. 创建钉钉机器人

打开电脑版钉钉,创建一个群,创建自定义机器人,按如下步骤创建
https://ding-doc.dingtalk.com/doc#/serverapi2/qf2nxq
https://developers.dingtalk.com/document/app/custom-robot-access
创建的机器人如下:
群设置—>智能群助手—>添加机器人—>自定义—>添加
机器人名称:test
接收群组:钉钉报警测试
安全设置:
自定义关键词:cluster1
上面配置好之后点击完成即可,这样就会创建一个 test 的报警机器人,创建机器人成功之后怎么查
看 webhook,按如下:
点击智能群助手,可以看到刚才创建的 test 这个机器人,点击 test,就会进入到 test 机器人的设置 界面
出现如下内容:
机器人名称:test
接受群组:钉钉报警测试
消息推送:开启
webhook:
https://oapi.dingtalk.com/robot/send?access_token=******
注意:access_token=用自己的钉钉的 token
安全设置:
自定义关键词:cluster1

  1. 安装钉钉的 webhook 插件,在 k8s 的控制节点操作

prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.zip

tar zxvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

cd prometheus-webhook-dingtalk-0.3.0.linux-amd64

对原来的 alertmanager-cm.yaml 文件做备份

cp alertmanager-cm.yaml alertmanager-cm.yaml.bak

重新生成一个新的 alertmanager-cm.yaml 文件

vim alertmanager-cm.yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: monitor-sa
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: '1814553****@163.com'
      smtp_auth_username: '1814553****'
      smtp_auth_password: 'LAYXLXRZGFUBWOMZ'
      smtp_require_tls: false
    route:
      group_by: [alertname]
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 10m
      receiver: cluster1
    receivers:
    - name: 'cluster1'
      webhook_configs:
      - url: 'http://192.168.1.63:8060/dingtalk/cluster1/send'
        send_resolved: true

修改 prometheus 任何一个配置文件之后,可通过 kubectl apply 使配置生效,执行顺序如下:

kubectl delete -f alertmanager-cm.yaml

kubectl apply -f alertmanager-cm.yaml

kubectl delete -f prometheus-alertmanager-cfg.yaml

kubectl apply -f prometheus-alertmanager-cfg.yaml 

kubectl delete -f prometheus-alertmanager-deploy.yaml

kubectl apply -f prometheus-alertmanager-deploy.yaml

18.8 Prometheus PromQL 语法

PromQL(Prometheus Query Language)是 Prometheus 自己开发的表达式语言,语言表现力很丰富,内置函数也很多。使用它可以对时序数据进行筛选和聚合。

18.8.1 数据类型

PromQL 表达式计算出来的值有以下几种类型:
瞬时向量 (Instant vector): 一组时序,每个时序只有一个采样值
区间向量 (Range vector): 一组时序,每个时序包含一段时间内的多个采样值
标量数据 (Scalar): 一个浮点数
字符串 (String): 一个字符串,暂时未用
1、瞬时向量选择器
瞬时向量选择器用来选择一组时序在某个采样点的采样值。最简单的情况就是指定一个度量指标,选择出所有属于该度量指标的时序的当前采样值。比如下面的表达式:
apiserver_request_total
image.png
可以通过在后面添加用大括号包围起来的一组标签键值对来对时序进行过滤。比如下面的表达式筛选出了 job 为 kubernetes-apiservers,并且 resource 为 pod 的时序:
apiserver_request_total{job=”kubernetes-apiserver”,resource=”pods”}
匹配标签值时可以是等于,也可以使用正则表达式。总共有下面几种匹配操作符:
=:完全相等
!=: 不相等
=~: 正则表达式匹配
!~: 正则表达式不匹配
下面的表达式筛选出了 container 是 kube-scheduler 或 kube-proxy 或 kube-apiserver 的时序数据
container_processes{container=~”kube-scheduler|kube-proxy|kube-apiserver”}

  1. 区间向量选择器

区间向量选择器类似于瞬时向量选择器,不同的是它选择的是过去一段时间的采样值。可以通过在瞬时向量选择器后面添加包含在 [] 里的时长来得到区间向量选择器。比如下面的表达式选出了所有度量指标为 apiserver_request_total 且 resource 是 pod 的时序在过去 1 分钟的采样值。
apiserver_request_total{job=”kubernetes-apiserver”,resource=”pods”}[1m]
image.png
这个不支持 Graph,需要选择 Console,才会看到采集的数据说明:时长的单位可以是下面几种之一:
s:seconds
m:minutes
h:hours
d:days
w:weeks
y:years

  1. 偏移向量选择器

前面介绍的选择器默认都是以当前时间为基准时间,偏移修饰器用来调整基准时间,使其往前偏移一段时间。偏移修饰器紧跟在选择器后面,使用 offset 来指定要偏移的量。比如下面的表达式选择度量名称为 apiserver_request_total 的所有时序在 5 分钟前的采样值。
apiserver_request_total{job=”kubernetes-apiserver”,resource=”pods”} offset 5m
下面的表达式选择 apiserver_request_total 度量指标在 1 周前的这个时间点过去 5 分钟的采样值。
apiserver_request_total{job=”kubernetes-apiserver”,resource=”pods”} [5m] offset 1w

  1. 聚合操作符

PromQL 的聚合操作符用来将向量里的元素聚合得更少。总共有下面这些聚合操作符:
sum:求和
min:最小值
max:最大值
avg:平均值
stddev:标准差
stdvar:方差
count:元素个数
count_values:等于某值的元素个数
bottomk:最小的 k 个元素
topk:最大的 k 个元素
quantile:分位数
如:
计算 xuegod63 节点所有容器总计内存
sum(container_memory_usage_bytes{instance=~”xuegod63”})/1024/1024/1024
计算 xuegod63 节点最近 1m 所有容器 cpu 使用率
sum (rate (container_cpu_usage_seconds_total{instance=~”xuegod63”}[1m])) / sum(machine_cpu_cores{ instance =~”xuegod63”}) * 100
计算最近 1m 所有容器 cpu 使用率
sum (rate (container_cpu_usage_seconds_total{id!=”/“}[1m])) by (id)
#把 id 会打印出来
结果如下:
image.png
5、函数
Prometheus 内置了一些函数来辅助计算,下面介绍一些典型的。
abs():绝对值
sqrt():平方根
exp():指数计算
ln():自然对数
ceil():向上取整
floor():向下取整
round():四舍五入取整
delta():计算区间向量里每一个时序第一个和最后一个的差值
sort():排序


18.9 prometheus 监控外部 k8s 集群

Prometheus 如果部署在 K8S 集群内采集 k8s 是很方便的,用官方给的 Yaml 就可以,但部署在集群外,二进制运行,采集多个 K8S 集群。
Prometheus 作为监控 K8S 的解决方案,都是在 K8S 集群内部署,这样可以直接调用集群内的 cert 及各种监控 url,但是增加了集群的资源开销。因此在资源有限的情况下,倾向于 K8S 集群外独立部署 Prometheus+Grafana。

  1. 构造 apiserver proxy url

k8s 集群中 nodes、pods、services 都有自己的私有 IP,但是无法从集群外访问,但 K8S 提供以下几种方式来访问:
通过 public IPs 访问 service
通过 proxy 访问 node、pod、service
通过集群内的 node 或 pod 间接访问
kubectl cluster-info 可以查看 kube-system 命令空间的 proxy url:
image.png
其默认的构造规则为:
# 访问 node
https://${ apiserver_address}/api/v1/nodes/node_name:[port_name]/proxy/metrics
# 访问 service
https://${ apiserver_address}/api/v1/namespaces/service_namespace/services/http:service_name[:port_name]/proxy/metrics
# 访问 pod
https://${ apiserver_address}/api/v1/namespaces/pod_namespace/pods/http:pod_name[:port_name]/proxy/metrics

  1. apiserver 授权

要访问 K8S apiserver 需要先进行授权,而集群内部 Prometheus 可以使用集群内默认配置进行访问,而集群外访问需要使用 token+客户端 cert 进行认证,因此需要先进行 RBAC 授权。
#在 k8s 集群的控制节点操作,创建一个 sa 账号

kubectl create serviceaccount monitor -n monitor-sa serviceaccount/monitor created

把 sa 账号 monitor 通过 clusterrolebing 绑定到 clusterrole 上

kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor-sa --clusterrole=cluster-admin --serviceaccount=monitor-sa:monitor

虽创建了 serviceaccount,但访问 apiserver 并不是直接使用 serviceaccount,而是通过token。因此我们需要获取 serviceaccount:monitor 对应的 token,而此 token 是经过 base64 加密过的,必须解密后才能使用。

kubectl get sa monitor -n monitor-sa -o yaml
# echo "SGLDDLFKGJLGIEMTV…" |base64 -d

以 Pod 方式运行在集群内是不需要证书的,但集群外需要声明 token 之类的证书,并替换address,即使用 apiserver Proxy 采集,以 cadvisor 采集为例,Job 配置为:

- job_name: cadvisor
  scrape_interval: 30s
  scrape_timeout: 10s
  scheme: https
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /k8s_token
  #使用 apiserver 授权部分解密的 token 值,以文件 k8s_token 形式存储
  # k8s 自动发现配置
  kubernetes_sd_configs:
  - role: node
    api_server: https://xx:6443
    bearer_token_file: /k8s_token
    tls_config:
      insecure_skip_verify: true
      #跳过验证,否则抓取会报错

注意:tls_config 和 bearer_token_file 必须在 job 内和 kubernetes_sd_configs 都存在,否则会导致认证失败。


十九、 在 k8s 平台部署智能化日志收集平台-采集 Pod 业务日志

es 版本,目前官方 docker hub 更新到 7.12.1:
https://registry.hub.docker.com/_/elasticsearch?tab=tags&page=1&ordering=last_updated
image.png
Kibana 版本,目前官方 docker hub 更新到 7.12.1:
https://registry.hub.docker.com/_/kibana?tab=tags&page=1&ordering=last_updated
image.png
Fluentd 版本,目前官方 docker hub 更新到 1.9.2:
https://registry.hub.docker.com/_/fluentd?tab=tags&page=1&ordering=last_updated
image.png

  1. 为什么要收集日志?

不管是在项目开发还是测试过程中,项目运行一旦出现问题日志信息就非常重要了。日志是定位问题的重要手段,就像侦探人员要根据现场留下的线索来推断案情。
日志级别:
第一级:DEBUG
调试信息,也是最详细的日志信息
第二级:INFO
证明事件按照预期工作
第三级:WARNING
表明发生了一些暂时不影响运行的错误,
第四级:ERROR
这一级就是比较重要的错误了,软件的某些功能已经不能继续执行了

  1. 传统物理机下部署应用应该采集哪些日志?

系统日志: 所有的系统应用都会在/var/log 目录下创建日志文件,系统日志主要存放系统内置程序或系统内核之类的日志信息,如 btmp 等
image.png
应用日志:应用日志主要是安装的第三方应用产生的日志,如 tomcat 、apache 等。

  1. k8s 集群应该采集哪些日志?

K8S 系统的组件日志:apiserver、scheduler、kubelet
K8S 集群里面部署的应用程序日志
在 Kubernetes 集群上运行多个服务和应用程序时,日志收集系统可以帮助你快速分类和分析由Pod 生成的大量日志数据。Kubernetes 中比较流行的日志收集解决方案是 Elasticsearch、Fluentd 和 Kibana(EFK)技术栈,也是官方推荐的一种方案。
在 k8s 集群中应用一般有如下日志输出方式:
直接遵循 docker 官方建议把日志输出到标准输出或者标准错误输出
输出日志到容器内指定目录中
应用直接发送日志给日志收集系统
采集日志的方案:
Filebeat(日志采集工具)+Logstach(数据处理引擎)+Elasticserch(数据存储、全文检索、分布式搜索引擎)+Kibana(展示数据、绘图、搜索)
image.png


19.1 分布式搜索引擎 elasticsearch 深度解读

19.1.1 Elasticsearch 概述

Elasticsearch 是一个实时的,分布式的,可扩展的搜索引擎,它允许进行全文本和结构化搜索以及对日志进行分析。它通常用于索引和搜索大量日志数据,也可以用于搜索许多不同种类的文档。
elasticsearch 具有三大功能,搜索、分析、存储数据
Elasticsearch 通常与 Kibana 一起部署,kibana 是 Elasticsearch 的功能强大的数据可视化的dashboard(仪表板)。Kibana 允许你通过 Web 界面浏览 Elasticsearch 日志数据,也可自定义查询条件快速检索出 elasticccsearch 中的日志数据。


19.1.2 Elasticsearch 特点

1、分布式的文档存储引擎
2、分布式的搜索引擎和分析引擎
3、可以横向扩展至数百甚至数千个服务器节点,同时可以处理 PB 级数据。


19.1.3 Elasticsearch 基本概念

1、集群(cluster)包含多个节点,每个节点属于哪个集群是通过一个配置(集群名称)来决定,节点可以分散到各个机器上。有一个主节点,通过选举产生,从外部来看 es 集群,在逻辑上是个整体,你与任何一个节点的通信和与整个 es 集群通信是等价的。
2、节点(node)集群中的一个节点,如果默认启动 1 个或者多个节点,那么他们自动组成一个集群。一个 elasticsearch 实例就是一个节点。每个节点可以有多个 shard,但是 primary shard 和对应 replica shard 不能在同一个节点上。
3、document 文档,es 中最小的数据单元,由 json 串组成,里面包含多个 field,每个 field 即是 一个数据字段。
4、index 索引,包含一堆具有相似结构的文档数据,一个索引默认有 5 个 primary shard,一个primary shard 对应一个 replica shard,即 5 个 primary shard 和 5 个 replica shard。一个完整索引可以分成多个分片分布到不同节点,分布式存储分布式搜索分片的数量只能在索引创建前指定,并且索引创建后不能更改。
5、type 类型,每个索引有一个或者多个 type,type 是 index 中的一个逻辑数据分类,一个 type下的 document,应该都有相同 field。es6 时,官方就提到了 es7 会删除 type,并且 es6 时已经规定每一个 index 只能有一个 type。在 es7 中使用默认的_doc 作为 type,官方说在 8.x 版本会彻底移除type。


19.2 高效的日志收集组件 fluentd 原理

19.2.1 fluentd 概述

Fluentd 是一个流行的开源数据收集器,专为处理数据流设计,使用 JSON 作为数据格式。我们在Kubernetes 集群节点上安装 Fluentd,通过获取容器日志文件、过滤和转换日志数据,然后将数据传递到 Elasticsearch 集群,在该集群中对其进行索引和存储。


19.2.2 常见的日志收集组件对比分析

  1. Logstash

Logstash 是一个开源数据收集引擎,具有实时管道功能。Logstash 可以动态地将来自不同数据源的数据统一起来,并将数据标准化输出到你所选择的目的地。logstash 具有 filter 功能,能过滤分析日志
优势:
Logstash 主要的优点就是它的灵活性,主要因为它有很多插件,我们基本上可以在网上找到很多资源,几乎可以处理任何问题。
劣势:
Logstash 的问题是它的性能以及资源消耗(默认的堆大小是 1GB)。另一个问题是它目前不支持缓存,目前的典型替代方案是将 Redis 或 Kafka 作为中心缓冲池。

  1. Filebeat

Filebeat 是一个轻量级的日志传输工具,它的存在正弥补了 Logstash 的缺点:Filebeat 作为一个轻量级的日志传输工具可以将日志推送到中心 Logstash。在版本 5.x 中,Elasticsearch 具有解析的能力(像 Logstash 过滤器)— Ingest。这也就意味着可以将数据直接用 Filebeat 推送到 Elasticsearch,并让 Elasticsearch 既做解析的事情,又做存储的事情。
不需要使用缓冲,因为 Filebeat 也会和 Logstash 一样记住上次读取的偏移,如果需要缓冲(例如,不希望将日志服务器的文件系统填满),可以使用 Redis/Kafka,因为 Filebeat 可以与它们进行通信。

  1. fluentd

fluentd 是一个针对日志的收集、处理、转发系统。通过丰富的插件系统,可以收集来自于各种系统或应用的日志,转化为用户指定的格式后,转发到用户所指定的日志存储系统之中。 fluentd 常常被拿来和 Logstash 比较,我们常说 ELK,L 就是这个 agent。fluentd 是随着 Docker,和 es 一起流行起来的 agent。 fluentd 比 logstash 更省资源。


19.3 数据的探索和可视化分析:kibana

kibana 是一个基于 Web 的图形界面,用于搜索、分析和可视化存储在 Elasticsearch 指标中的日 志数据。Kibana 功能众多,在“Visualize” 菜单界面可以将查询出的数据进行可视化展示,“Dev Tools” 菜单界面可以让户方便地通过浏览器直接与 Elasticsearch 进行交互,发送 RESTFUL 对 Elasticsearch 数据进行增删改查。。它操作简单,基于浏览器的用户界面可以快速创建仪表板 (dashboard)实时显示 Elasticsearch 查询动态。设置 Kibana 非常简单。无需编码或者额外的基础架构,几分钟内就可以完成 Kibana 安装并启动 Elasticsearch 索引监测。
image.png
image.png


19.4 在 k8s 中安装 EFK 组件

我们先来配置启动一个可扩展的 Elasticsearch 集群,然后在 Kubernetes 集群中创建一个Kibana 应用,最后通过 DaemonSet 来运行 Fluentd,以便它在每个 Kubernetes 工作节点上都可以运行一个 Pod

  1. 创建命名空间 ```shell kubectl create ns kube-logging

kubectl get ns kube-logging


2. 安装es
```shell
vim  elasticsearch_svc.yaml
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch
  namespace: kube-logging
  labels:
    app: elasticsearch
spec:
  selector:
    app: elasticsearch
  clusterIP: None
  ports:
  - port: 9200
    name: rest
  - port: 9300
    name: inter-node
 kubectl apply -f elasticsearch_svc.yaml

 kubectl get svc -n kube-logging
  1. 安装nfs供应商