1. Datte: 2022/01/15
  • 1.Cilium Install Guide - Requirements ```properties This guide will walk you through the quick default installation. It will automatically detect and use the best configuration possible for the Kubernetes distribution you are using. All state is stored using Kubernetes CRDs. This is the best installation method for most use cases. For large environments (> 500 nodes) or if you want to run specific datapath modes, refer to the Advanced Installation guide.

System Requirements Before installing Cilium, please ensure that your system meets the minimum requirements below. Most modern Linux distributions already do.

Summary 1.When running Cilium using the container image cilium/cilium, the host system must meet these requirements: Linux kernel >= 4.9.17 我们使用:[root@dev1 ~]# uname -r 5.15.8-1.el7.elrepo.x86_64

2.When running Cilium as a native process on your host (i.e. not running the cilium/cilium container image) these additional requirements must be met: 这里我们是运行在Container 中,且是在Kubernetes的环境中。所以下边的requirements我们就可以略过。

3.systemd 问题: Systemd 245 and above (systemctl —version) overrides rp_filter setting of Cilium network interfaces. This introduces connectivity issues (see GitHub issue 10645 for details). To avoid that, configure rp_filter in systemd using the following commands: echo ‘net.ipv4.conf.lxc*.rp_filter = 0’ > /etc/sysctl.d/99-override_cilium_rp_filter.conf systemctl restart systemd-sysctl [https://github.com/cilium/cilium/issues/10645]

4.Linux Kernel 问题: Cilium leverages and builds on the kernel eBPF functionality as well as various subsystems which integrate with eBPF. Therefore, host systems are required to run Linux kernel version 4.9.17 or later to run a Cilium agent. More recent kernels may provide additional eBPF functionality that Cilium will automatically detect and use on agent start. In order for the eBPF feature to be enabled properly, the following kernel configuration options must be enabled. This is typically the case with distribution kernels. When an option can be built as a module or statically linked, either choice is valid. [root@dev1 boot]#vi /boot/config-$(uname -r) # 通常情况下应该都是满足的。

5.Required Kernel Versions for Advanced Features Cilium requires Linux kernel 4.9.17 or higher; however, development on additional kernel features continues to progress in the Linux community. Some of Cilium’s features are dependent on newer kernel versions and are thus enabled by upgrading to more recent kernel versions as detailed below. 我们使用的内核是5.15.所以以下所有的advanced feature,我们都满足!!! https://docs.cilium.io/en/latest/operations/system_requirements/#required-kernel-versions-for-advanced-features

6.Key-Value store[https://docs.cilium.io/en/latest/operations/system_requirements/#key-value-store] Cilium optionally uses a distributed Key-Value store to manage, synchronize and distribute security identities across all cluster nodes. The following Key-Value stores are currently supported:

etcd >= 3.1.0 # 我们是based on 在Kubernetes 1.20.5 的版本上。其中etcd的版文为: [root@dev1 ~]# kubectl -nkube-system exec -it etcd-dev1 — etcdctl version etcdctl version: 3.4.13 API version: 3.4 [root@dev1 ~]# [Cilium can be used without a Key-Value store when CRD-based state management is used with Kubernetes.] This is the default for new Cilium installations. Larger clusters will perform better with a Key-Value store backed identity management instead, see Quick Installation for more details. See Key-Value[https://docs.cilium.io/en/latest/cmdref/kvstore/#install-kvstore] Store for details on how to configure the cilium-agent to use a Key-Value store.

7.clang+LLVM This requirement is only needed if you run cilium-agent natively. If you are using the Cilium container image cilium/cilium, clang+LLVM is included in the container image. 我们使用的contaienr的环境,所以我们不需要考虑该case。

8.iproute2 iproute2 is only needed if you run cilium-agent directly on the host machine. iproute2 is included in the cilium/cilium container image. 我们也是运行在container中,所以忽略。

9.Firewall Rules [https://docs.cilium.io/en/latest/operations/system_requirements/#firewall-rules] 通常情况下我们都是禁用的firewalld的。所以不存在这个问题。 但是我们需要注意的是:如果你的k8s环境是运行在OpenStack(公有云)环境的话,需要考虑到其给虚机设置的安全组(security-group)。这点非常重要。

10.Mounted eBPF filesystem Some distributions mount the bpf filesystem automatically. Check if the bpf filesystem is mounted by running the command.

mount | grep /sys/fs/bpf

if present should output, e.g. “none on /sys/fs/bpf type bpf”…

我的环境是自己手工locd的: [root@dev1 ~]# systemctl cat sys-fs-bpf.mount

/etc/systemd/system/sys-fs-bpf.mount

[Unit] Description=Cilium BPF mounts Documentation=https://docs.cilium.io/ DefaultDependencies=no Before=local-fs.target umount.target After=swap.target [Mount] What=bpffs Where=/sys/fs/bpf Type=bpf Options=rw,nosuid,nodev,noexec,relatime,mode=700 [Install] WantedBy=multi-user.target [root@dev1 ~]# [If the eBPF filesystem is not mounted in the host filesystem, Cilium will automatically mount the filesystem.] [Mounting this BPF filesystem allows the cilium-agent to persist eBPF resources across restarts of the agent so that the datapath can continue to operate while the agent is subsequently restarted or upgraded.]

Optionally it is also possible to mount the eBPF filesystem before Cilium is deployed in the cluster, the following command must be run in the host mount namespace. The command must only be run once during the boot process of the machine.

[roo@dev1]# mount bpffs /sys/fs/bpf -t bpf

A portable way to achieve this with persistence is to add the following line to /etc/fstab and then run mount /sys/fs/bpf. This will cause the filesystem to be automatically mounted when the node boots.

bpffs /sys/fs/bpf bpf defaults 0 0 If you are using systemd to manage the kubelet, see the section Mounting BPFFS with systemd.

11.Privileges The following privileges are required to run Cilium. When running the standard Kubernetes DaemonSet, the privileges are automatically granted to Cilium.

Cilium interacts with the Linux kernel to install eBPF program which will then perform networking tasks and implement security rules. In order to install eBPF programs system-wide, CAP_SYS_ADMIN privileges are required. These privileges must be granted to cilium-agent.

[The quickest way to meet the requirement is to run cilium-agent as root and/or as privileged container.] testing 的话,用这种,省事。

Cilium requires access to the host networking namespace. For this purpose, the Cilium pod is scheduled to run in the host networking namespace directly.


- [x] **2.Install Cilium**
```properties
[root@dev1 ~]# cilium install 
ℹ️  using Cilium version "v1.11.0"
 Auto-detected cluster name: kubernetes
 Auto-detected IPAM mode: cluster-pool
 Found CA in secret cilium-ca
 Generating certificates for Hubble...
 Creating Service accounts...
 Creating Cluster roles...
 Creating ConfigMap for Cilium version 1.11.0...
 Creating Agent DaemonSet...
 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed and ready...
♻️  Restarting unmanaged pods...
♻️  Restarted unmanaged pod kube-system/coredns-7f89b7bc75-8nl2r
♻️  Restarted unmanaged pod kube-system/coredns-7f89b7bc75-jcqzl
✅ Cilium was successfully installed! Run 'cilium status' to view installation health
[root@dev1 ~]# 

[root@dev1 ~]# cilium status 
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         1 errors, 3 warnings
 \__/¯¯\__/    Operator:       1 errors, 1 warnings
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

DaemonSet         cilium             Desired: 3, Unavailable: 3/3
Deployment        cilium-operator    Desired: 1, Unavailable: 1/1
Containers:       cilium             Pending: 3
                  cilium-operator    Pending: 1
Cluster Pods:     0/0 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.11.0: 3
                  cilium-operator    quay.io/cilium/operator-generic:v1.11.0: 1
Errors:           cilium-operator    cilium-operator                    1 pods of Deployment cilium-operator are not ready
                  cilium             cilium                             3 pods of DaemonSet cilium are not ready
Warnings:         cilium             cilium-p6pd4                       pod is pending
                  cilium             cilium-pr8ls                       pod is pending
                  cilium             cilium-rdrkg                       pod is pending
                  cilium-operator    cilium-operator-655f978f6-rkn4f    pod is pending
[root@dev1 ~]# 

[root@dev1 ~]# kubectl get pods -o wide -A
NAMESPACE     NAME                              READY   STATUS              RESTARTS   AGE   IP             NODE     NOMINATED NODE   READINESS GATES
kube-system   cilium-operator-655f978f6-rkn4f   0/1     ContainerCreating   0          40s   192.168.2.33   dev3     <none>           <none>
kube-system   cilium-p6pd4                      0/1     Init:0/2            0          40s   192.168.2.33   dev3     <none>           <none>
kube-system   cilium-pr8ls                      0/1     Init:0/2            0          40s   192.168.2.31   dev1     <none>           <none>
kube-system   cilium-rdrkg                      0/1     Init:0/2            0          40s   192.168.2.32   dev2     <none>           <none>
kube-system   coredns-7f89b7bc75-9hkq6          0/1     Pending             0          40s   <none>         <none>   <none>           <none>
kube-system   coredns-7f89b7bc75-cpfsh          0/1     Pending             0          40s   <none>         <none>   <none>           <none>
kube-system   etcd-dev1                         1/1     Running             1          26d   192.168.2.31   dev1     <none>           <none>
kube-system   kube-apiserver-dev1               1/1     Running             1          26d   192.168.2.31   dev1     <none>           <none>
kube-system   kube-controller-manager-dev1      1/1     Running             1          26d   192.168.2.31   dev1     <none>           <none>
kube-system   kube-scheduler-dev1               1/1     Running             1          26d   192.168.2.31   dev1     <none>           <none>
[root@dev1 ~]# 
[不巧这里安装失败:]原因是:https://github.com/projectcalico/calico/issues/3422  我们在安装Kubernetes的集群的时候跳过了kube-proxy的安装导致。[The 10.96.0.1 error suggests an issue reaching the Kube API server using the kubernetes service that kube-proxy should be setting up.]
[kubeadm init --kubernetes-version=v1.20.5 --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --skip-phases=addon/kube-proxy --ignore-preflight-errors=Swap]
level=info msg=" - !:etcd_node" subsys=labels-filter
level=info msg="Auto-disabling \"enable-bpf-clock-probe\" feature since KERNEL_HZ cannot be determined" error="Cannot probe CONFIG_HZ" subsys=daemon
level=info msg="Using autogenerated IPv4 allocation range" subsys=node v4Prefix=10.33.0.0/16
level=info msg="Initializing daemon" subsys=daemon
level=info msg="Establishing connection to apiserver" host="https://10.96.0.1:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://10.96.0.1:443" subsys=k8s
level=error msg="Unable to contact k8s api-server" error="Get \"https://10.96.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.96.0.1:443: i/o timeout" ipAddr="https://10.96.0.1:443" subsys=k8s
level=fatal msg="Unable to initialize Kubernetes subsystem" error="unable to create k8s client: unable to create k8s client: Get \"https://10.96.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.96.0.1:443: i/o timeout" subsys=daemon
[root@dev1 ~]#

所以这里就kubeadm  reset一下集群。然后重新部署一下:
[root@dev1 ~]# kubeadm init --kubernetes-version=v1.20.5 --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12  --ignore-preflight-errors=Swap
[root@dev1 ~]# cilium install 
ℹ️  using Cilium version "v1.11.0"
 Auto-detected cluster name: kubernetes
 Auto-detected IPAM mode: cluster-pool
 Found CA in secret cilium-ca
 Generating certificates for Hubble...
 Creating Service accounts...
 Creating Cluster roles...
 Creating ConfigMap for Cilium version 1.11.0...
 Creating Agent DaemonSet...
 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed and ready...
✅ Cilium was successfully installed! Run 'cilium status' to view installation health
[root@dev1 ~]# 
关于cilium 和cilium-operator的log我也收集了一份:可以和我们后边使用helm安装的时候做一下对比,官网说道:这种cilium install 会智能的帮我们选择最优的安装方式。
cilium pod logs:https://raw.githubusercontent.com/BurlyLuo/train/main/Cilium/cilium.log
cilium-operator pods logs: https://raw.githubusercontent.com/BurlyLuo/train/main/Cilium/cilium-operator.log
[root@dev1 ~]# cilium version 
cilium-cli: v0.10.0 compiled with go1.17.4 on linux/amd64
cilium image (default): v1.11.0
cilium image (stable): unknown
cilium image (running): v1.11.0
安装完成!!!
Congratulations! You have a fully functional Kubernetes cluster with Cilium. 🎉
[root@dev1 ~]# kubectl get pods -o wide -A 
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE   IP             NODE   NOMINATED NODE   READINESS GATES
cilium-test   client-6488dcf5d4-4hgc5            1/1     Running   0          14h   10.0.2.241     dev2   <none>           <none>
cilium-test   client2-6dd75b74c6-ghsdm           1/1     Running   0          14h   10.0.2.217     dev2   <none>           <none>
cilium-test   echo-other-node-697d5d69b7-9qp9x   1/1     Running   0          14h   10.0.1.193     dev3   <none>           <none>
cilium-test   echo-same-node-7967996674-dcnb9    1/1     Running   0          14h   10.0.2.209     dev2   <none>           <none>
kube-system   cilium-djnww                       1/1     Running   0          14h   192.168.2.31   dev1   <none>           <none>
kube-system   cilium-hjwtm                       1/1     Running   0          14h   192.168.2.33   dev3   <none>           <none>
kube-system   cilium-jrhnw                       1/1     Running   0          14h   192.168.2.32   dev2   <none>           <none>
kube-system   cilium-operator-655f978f6-hm4rx    1/1     Running   0          14h   192.168.2.33   dev3   <none>           <none>
kube-system   coredns-7f89b7bc75-hr2rn           1/1     Running   0          14h   10.0.2.48      dev2   <none>           <none>
kube-system   coredns-7f89b7bc75-zngjl           1/1     Running   0          14h   10.0.2.216     dev2   <none>           <none>
kube-system   etcd-dev1                          1/1     Running   0          14h   192.168.2.31   dev1   <none>           <none>
kube-system   kube-apiserver-dev1                1/1     Running   0          14h   192.168.2.31   dev1   <none>           <none>
kube-system   kube-controller-manager-dev1       1/1     Running   0          14h   192.168.2.31   dev1   <none>           <none>
kube-system   kube-proxy-j4nqc                   1/1     Running   0          14h   192.168.2.33   dev3   <none>           <none>
kube-system   kube-proxy-rrqmt                   1/1     Running   0          14h   192.168.2.31   dev1   <none>           <none>
kube-system   kube-proxy-vflss                   1/1     Running   0          14h   192.168.2.32   dev2   <none>           <none>
kube-system   kube-scheduler-dev1                1/1     Running   0          14h   192.168.2.31   dev1   <none>           <none>

3.进入Pod后,我们发现实际上并不是那么的“智能”:
[root@dev1 ~]# kubectl -nkube-system exec -it cilium-9pnf4 bash 
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@dev2:/home/cilium# cilium status
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.20 (v1.20.5) [linux/amd64]
Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1beta1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Disabled   
Host firewall:          Disabled
Cilium:                 Ok   1.11.0 (v1.11.0-27e0848)
NodeMonitor:            Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok   
IPAM:                   IPv4: 7/254 allocated from 10.0.2.0/24, 
BandwidthManager:       Disabled
Host Routing:           Legacy    # 这里的host routing 还是Legacy。
Masquerading:           IPTables [IPv4: Enabled, IPv6: Disabled]   # 这里的SANT还是基于iptables实现。
Controller Status:      45/45 healthy
Proxy Status:           OK, ip 10.0.2.3, 0 redirects active on ports 10000-20000
Hubble:                 Ok   Current/Max Flows: 2637/4095 (64.40%), Flows/s: 12.88   Metrics: Disabled
Encryption:             Disabled
Cluster health:         3/3 reachable   (2022-01-15T07:02:31Z)
root@dev2:/home/cilium#