External traffic policy
Services can be configured to have externalTrafficPolicy Cluster (default) or Local. If the externalTrafficPolicy is Cluster (default) the ECFE speaker on a given node will announce the service External IP no matter if the node has a service endpoint. If the externalTrafficPolicy is Local the ECFE speaker on a given node will announce the service External IP, only if a healthy service endpoint exist on the node.

From https://adp.ericsson.se/marketplace/external-connectivity-frontend/documentation/development/dpi/application-developers-guide

Ingress controller service manifest.
eccd@control-plane-wfb88:~> kubectl get svc ingress-nginx -n ingress-nginx -o yaml
apiVersion: v1
kind: Service
metadata:
{…}
spec:
clusterIP: 10.97.74.168
externalTrafficPolicy: Cluster
ports:

  • name: http
    nodePort: 32151
    port: 80
    protocol: TCP
    targetPort: 80
  • name: https
    nodePort: 31255
    port: 443
    protocol: TCP
    targetPort: 443
    selector:
    app: ingress-nginx
    sessionAffinity: None
    type: LoadBalancer
    status:
    loadBalancer:
    ingress:
    • ip: 10.33.151.160

let’s use dc173prometheus1.cloud.k2.ericsson.se as example.

Source IP: seroius01242
10.210.150.87/28
Dest: dc173prometheus1.cloud.k2.ericsson.se

  1. 1. Lookup DNS record:
  2. eccd@control-plane-wfb88:~> nslookup dc173prometheus1.cloud.k2.ericsson.se
  3. Server: 10.96.0.10
  4. Address: 10.96.0.10#53
  5. Non-authoritative answer:
  6. dc173prometheus1.cloud.k2.ericsson.se canonical name = dc173dashboard1.cloud.k2.ericsson.se.
  7. Name: dc173dashboard1.cloud.k2.ericsson.se
  8. Address: 10.33.151.160
  9. eccd@control-plane-wfb88:~>
  10. 2. Verify FQDN is exposed via Kubernetes ingress
  11. eccd@control-plane-wfb88:~> kubectl get ing -A
  12. NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE
  13. monitoring prometheus-server <none> dc173prometheus1.cloud.k2.ericsson.se 10.0.10.102,10.0.10.103,10.0.10.104,10.0.10.108 80 28d
  14. Expose Prometheus via ingress if not ready done:
  15. cat >> PM-ingress.yaml << EOF
  16. apiVersion: extensions/v1beta1
  17. kind: Ingress
  18. metadata:
  19. labels:
  20. app: eric-pm-server
  21. component: server
  22. name: prometheus-server
  23. namespace: monitoring
  24. spec:
  25. rules:
  26. - host: dc173prometheus1.cloud.k2.ericsson.se
  27. http:
  28. paths:
  29. - backend:
  30. serviceName: eric-pm-server
  31. servicePort: 9090
  32. path: /
  33. EOF
  34. Apply ingress rule
  35. kubectl apply -f PM-ingress.yaml
  36. 3. Verify external IP is exposed by ingress-nginx service with type LoadBalancer.
  37. eccd@control-plane-wfb88:~> kubectl get svc -A
  38. NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  39. ingress-nginx ingress-nginx LoadBalancer 10.97.74.168 10.33.151.160 80:32151/TCP,443:31255/TCP 28d
  40. At this stage, DNS record is consistent with Kubernetes ingress configuration.
  41. 4. Check ECFE configmap
  42. eccd@control-plane-wfb88:~> kubectl get cm ecfe-ccdadm -n kube-system -o yaml
  43. apiVersion: v1
  44. data:
  45. config: |
  46. bgp-bfd-peers:
  47. - peer-address: 10.33.151.66
  48. peer-asn: 4200000000
  49. my-asn: 4259841731
  50. hold-time: 3s
  51. min-rx: 300ms
  52. min-tx: 300ms
  53. multiplier: 3
  54. my-address-pools:
  55. - ingress
  56. - oam-pool
  57. - peer-address: 10.33.151.67
  58. peer-asn: 4200000000
  59. my-asn: 4259841731
  60. hold-time: 3s
  61. min-rx: 300ms
  62. min-tx: 300ms
  63. multiplier: 3
  64. my-address-pools:
  65. - ingress
  66. - oam-pool
  67. - peer-address: 10.33.152.66
  68. peer-asn: 4200000000
  69. my-asn: 4259841731
  70. hold-time: 3s
  71. min-rx: 300ms
  72. min-tx: 300ms
  73. multiplier: 3
  74. my-address-pools:
  75. - traffic-pool
  76. - peer-address: 10.33.152.67
  77. peer-asn: 4200000000
  78. my-asn: 4259841731
  79. hold-time: 3s
  80. min-rx: 300ms
  81. min-tx: 300ms
  82. multiplier: 3
  83. my-address-pools:
  84. - traffic-pool
  85. - peer-address: 172.80.11.2
  86. peer-asn: 4200000000
  87. my-asn: 4259841731
  88. hold-time: 3s
  89. min-rx: 300ms
  90. min-tx: 300ms
  91. multiplier: 3
  92. my-address-pools:
  93. - traffic2-pool
  94. - peer-address: 172.80.11.3
  95. peer-asn: 4200000000
  96. my-asn: 4259841731
  97. hold-time: 3s
  98. min-rx: 300ms
  99. min-tx: 300ms
  100. multiplier: 3
  101. my-address-pools:
  102. - traffic2-pool
  103. address-pools:
  104. - name: ingress
  105. protocol: bgp
  106. addresses:
  107. - 10.33.151.160/32
  108. - name: oam-pool
  109. protocol: bgp
  110. addresses:
  111. - 10.33.151.161-10.33.151.167
  112. auto-assign: false
  113. - name: traffic-pool
  114. protocol: bgp
  115. addresses:
  116. - 10.33.152.16/28
  117. auto-assign: false
  118. - name: traffic2-pool
  119. protocol: bgp
  120. addresses:
  121. - 10.33.152.32/28
  122. auto-assign: false
  123. kind: ConfigMap
  124. VIP 10.33.151.160/32 is configured in ECFE ingress address pool. It's announced to BGP peer 10.33.151.66 and 10.33.151.67.
  125. 5. Verify SLX configuration
  126. DC173-SLX-L1A# show run int ve
  127. interface Ve 805
  128. vrf forwarding DC173_CCD1_om_vr
  129. ip anycast-address 10.33.151.65/27
  130. ip address 10.33.151.66/27
  131. no shutdown
  132. !
  133. DC173-SLX-L1B# show run int ve
  134. interface Ve 805
  135. vrf forwarding DC173_CCD1_om_vr
  136. ip anycast-address 10.33.151.65/27
  137. ip address 10.33.151.67/27
  138. no shutdown
  139. !
  140. DC173-SLX-L1B#
  141. DC173-SLX-L1A# show ip bgp vrf DC173_CCD1_om_vr
  142. Total number of BGP Routes: 31
  143. Status codes: s suppressed, d damped, h history, * valid, > best, i internal, S stale, x best-external
  144. Origin codes: i - IGP, e - EGP, ? - incomplete
  145. Network Next Hop RD MED LocPrf Weight Path
  146. *>x 0.0.0.0/0 10.90.0.2 none 100 0 100 i
  147. * 0.0.0.0/0 10.90.0.6 none 100 0 100 i
  148. *i 0.0.0.0/0 21.1.1.1 none 100 0 100 i
  149. *> 10.33.151.64/27 0.0.0.0 0 100 32768 ?
  150. *i 10.33.151.64/27 21.1.1.1 0 100 0 ?
  151. *> 10.33.151.96/27 0.0.0.0 0 100 32768 ?
  152. *i 10.33.151.96/27 21.1.1.1 0 100 0 ?
  153. *> 10.33.151.128/27 0.0.0.0 0 100 32768 ?
  154. *i 10.33.151.128/27 21.1.1.1 0 100 0 ?
  155. *>x 10.33.151.160/32 10.33.151.68 none 100 0 4259841731 i
  156. * 10.33.151.160/32 10.33.151.69 none 100 0 4259841731 i
  157. * 10.33.151.160/32 10.33.151.70 none 100 0 4259841731 i
  158. * 10.33.151.160/32 10.33.151.71 none 100 0 4259841731 i
  159. * 10.33.151.160/32 10.33.151.72 none 100 0 4259841731 i
  160. * 10.33.151.160/32 10.33.151.73 none 100 0 4259841731 i
  161. * 10.33.151.160/32 10.33.151.74 none 100 0 4259841731 i
  162. * 10.33.151.160/32 10.33.151.75 none 100 0 4259841731 i
  163. * 10.33.151.160/32 10.33.151.76 none 100 0 4259841731 i
  164. * 10.33.151.160/32 10.33.151.77 none 100 0 4259841731 i
  165. * 10.33.151.160/32 10.33.151.78 none 100 0 4259841731 i
  166. *i 10.33.151.160/32 21.1.1.1 none 100 0 4259841731 i
  167. *> 10.33.151.176/28 0.0.0.0 0 100 32768 ?
  168. *i 10.33.151.176/28 21.1.1.1 0 100 0 ?
  169. *> 10.90.0.0/30 0.0.0.0 0 100 32768 ?
  170. *> 10.90.0.4/30 0.0.0.0 0 100 32768 ?
  171. *>i 10.90.1.0/30 21.1.1.1 0 100 0 ?
  172. *>i 10.90.1.4/30 21.1.1.1 0 100 0 ?
  173. *> 21.1.1.0/31 0.0.0.0 0 100 32768 ?
  174. *i 21.1.1.0/31 21.1.1.1 0 100 0 ?
  175. *> 172.80.70.0/24 0.0.0.0 0 100 32768 ?
  176. *i 172.80.70.0/24 21.1.1.1 0 100 0 ?
  177. DC173-SLX-L1A#
  178. Dst=10.33.151.160, there are a few next hops, from 10.33.151.68 to 10.33.151.78.
  179. Traffic will be load balanced by ECMP towards those 11 IPs.
  180. <--------First load balancing by ECMP - Layer 4-------->
  181. SLX supports ECMP for L3 forwarding using a modulo N hash algorithm implemented in the forwarding ASICs
  182. Traffic is forwarded to a specific nexthop derived by hashing on the 5-tuple frame data (IPSA, IPDA, IPProto, L4 SP, L4 DP) and computing the specific index using a module N method when N = pathCount in the ECMP list of Nexthops
  183. 6. Verify who owns those BGP speaker IP:
  184. eccd@control-plane-wfb88:~> for i in `kubectl get no -o json | jq -r '.items[].status.addresses[] | select(.type=="InternalIP") | .address'` ; do { ssh $i -q hostname & ssh $i -q ip a show ccd_ecfe_om | grep "inet " | awk -F'[: ]+' '{ print $3 }'; } | paste -s; done
  185. control-plane-4qjt9 10.33.151.77/27
  186. control-plane-vxt5p 10.33.151.68/27 10.33.151.132/32
  187. control-plane-wfb88 10.33.151.78/27
  188. pool1-5db74894bb-tf54f 10.33.151.69/27
  189. pool2-9976d6447-2gnmp 10.33.151.73/27
  190. pool2-9976d6447-5bhnl 10.33.151.74/27
  191. pool2-9976d6447-djhjx 10.33.151.75/27
  192. pool2-9976d6447-ms9hg 10.33.151.71/27
  193. 10.33.151.70/27 pool2-9976d6447-nd4xl
  194. pool2-9976d6447-r776t 10.33.151.72/27
  195. pool2-9976d6447-xmbcz 10.33.151.76/27
  196. 7. Verify src=10.210.150.87, dst=10.33.151.160 can be seen in one of the BGP speaker node. One BGP speaker node has 1/11 chance to receive the ingress traffic.
  197. eccd@control-plane-wfb88:~> sudo tcpdump -vvveni ccd_ecfe_om
  198. Frame 1: 74 bytes on wire (592 bits), 74 bytes captured (592 bits)
  199. Ethernet II, Src: ExtremeN_d6:8e:bc (00:04:96:d6:8e:bc), Dst: ea:a3:88:4e:f3:00 (ea:a3:88:4e:f3:00)
  200. Internet Protocol Version 4, Src: 10.210.150.87, Dst: 10.33.151.160
  201. Transmission Control Protocol, Src Port: 56200, Dst Port: 80, Seq: 0, Len: 0
  202. Frame 4: 74 bytes on wire (592 bits), 74 bytes captured (592 bits)
  203. Ethernet II, Src: ExtremeN_d6:88:76 (00:04:96:d6:88:76), Dst: ea:a3:88:4e:f3:00 (ea:a3:88:4e:f3:00)
  204. Internet Protocol Version 4, Src: 10.210.150.87, Dst: 10.33.151.160
  205. Transmission Control Protocol, Src Port: 56202, Dst Port: 80, Seq: 0, Len: 0
  206. 8. Verify the IP is forwarded by SLX.
  207. 00:04:96:d6:8e:bc belongs to SLX1.
  208. 00:04:96:d6:88:76 belongs to SLX2.
  209. DC173-SLX-L1A# show interface | begin "Ve 805"
  210. Ve 805 is up, line protocol is up
  211. Address is 0004.96d6.8ebc, Current address is 0004.96d6.8ebc
  212. DC173-SLX-L1B# show interface | begin "Ve 805"
  213. Ve 805 is up, line protocol is up
  214. Address is 0004.96d6.8876, Current address is 0004.96d6.8876
  215. Interface index (ifindex) is 1207960357 (0x48000325)
  216. iptables approach - Very important concept - user defined chain
  217. Due to externalTrafficPolicy: Cluster, all BGP speaker node shall be able to see the iptables rule:
  218. a. All traffic destined 10.33.151.160/32 with dest port 443 will enter user defined KUBE-FW-4E7KSV2ABIFJRAUZ chain.
  219. control-plane-mjnp4:/home/eccd # iptables -t nat -S|grep 10.33.151.160
  220. -A KUBE-SERVICES -d 10.33.151.160/32 -p tcp -m comment --comment "ingress-nginx/ingress-nginx:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-4E7KSV2ABIFJRAUZ <-next step b
  221. -A KUBE-SERVICES -d 10.33.151.160/32 -p tcp -m comment --comment "ingress-nginx/ingress-nginx:http loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-REQ4FPVT7WYF4VLA
  222. control-plane-mjnp4:/home/eccd #
  223. b. All traffic enter user defined chain KUBE-FW-4E7KSV2ABIFJRAUZ will be firstly sent to user fined chain KUBE-MARK-MASQ, then KUBE-SVC-4E7KSV2ABIFJRAUZ, eventually if nothing matched, KUBE-MARK-DROP
  224. control-plane-mjnp4:/home/eccd # iptables -t nat -S|grep KUBE-FW-4E7KSV2ABIFJRAUZ
  225. -N KUBE-FW-4E7KSV2ABIFJRAUZ
  226. -A KUBE-FW-4E7KSV2ABIFJRAUZ -m comment --comment "ingress-nginx/ingress-nginx:https loadbalancer IP" -j KUBE-MARK-MASQ <-next step i
  227. -A KUBE-FW-4E7KSV2ABIFJRAUZ -m comment --comment "ingress-nginx/ingress-nginx:https loadbalancer IP" -j KUBE-SVC-4E7KSV2ABIFJRAUZ <-next step ii
  228. -A KUBE-FW-4E7KSV2ABIFJRAUZ -m comment --comment "ingress-nginx/ingress-nginx:https loadbalancer IP" -j KUBE-MARK-DROP <-next step iii
  229. -A KUBE-SERVICES -d 10.33.151.160/32 -p tcp -m comment --comment "ingress-nginx/ingress-nginx:https loadbalancer IP" -m tcp --dport 443 -j KUBE-FW-4E7KSV2ABIFJRAUZ
  230. control-plane-mjnp4:/home/eccd #
  231. i. all traffic sent to chain KUBE-MARK-MASQ will be set with mark 0x4000/0x4000
  232. -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
  233. Chain KUBE-MARK-MASQ (252 references)
  234. target prot opt source destination
  235. MARK all -- anywhere anywhere MARK or 0x4000
  236. ii. All traffic with processed in chain KUBE-MARK-MASQ will be processed by the next rule, jump to chain KUBE-SVC-4E7KSV2ABIFJRAUZ .
  237. control-plane-mjnp4:/home/eccd # iptables -t nat -S|grep KUBE-SVC-4E7KSV2ABIFJRAUZ | grep -v "j KUBE-SVC-4E7KSV2ABIFJRAUZ"
  238. -N KUBE-SVC-4E7KSV2ABIFJRAUZ
  239. -A KUBE-SVC-4E7KSV2ABIFJRAUZ -m comment --comment "ingress-nginx/ingress-nginx:https" -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-VNARWU5QWGSG2GUG <25% traffic will be load balanced to chain KUBE-SEP-VNARWU5QWGSG2GUG - remaining three nginx ingress controller pods>
  240. -A KUBE-SVC-4E7KSV2ABIFJRAUZ -m comment --comment "ingress-nginx/ingress-nginx:https" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-JRQUMAVVXJLRS3O3 <33.3% remaining traffic will be load balanced to chain KUBE-SEP-JRQUMAVVXJLRS3O3 - remaining two nginx ingress controller pods>
  241. -A KUBE-SVC-4E7KSV2ABIFJRAUZ -m comment --comment "ingress-nginx/ingress-nginx:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ILQWMWXVY6HNWZZT <50% remaining traffic will be load balanced to chain KUBE-SEP-ILQWMWXVY6HNWZZT- remaining one nginx ingress controller pods>
  242. -A KUBE-SVC-4E7KSV2ABIFJRAUZ -m comment --comment "ingress-nginx/ingress-nginx:https" -j KUBE-SEP-IM3OJF4BZQHSU4FK
  243. <remaining stream will be certainly sent to chain KUBE-SEP-IM3OJF4BZQHSU4FK - traffic are evenly load balanced to all four nginx ingress controller pods>
  244. control-plane-mjnp4:/home/eccd #
  245. 1) Traffic entered KUBE-SEP-VNARWU5QWGSG2GUG, it's DNAT'ed to 192.168.103.142:443 since it's the above other three chains(nginx pods)
  246. control-plane-mjnp4:/home/eccd # iptables -t nat -S|grep KUBE-SEP-VNARWU5QWGSG2GUG | grep -v "j KUBE-SEP-VNARWU5QWGSG2GUG"
  247. -N KUBE-SEP-VNARWU5QWGSG2GUG
  248. -A KUBE-SEP-VNARWU5QWGSG2GUG -s 192.168.103.142/32 -m comment --comment "ingress-nginx/ingress-nginx:https" -j KUBE-MARK-MASQ
  249. -A KUBE-SEP-VNARWU5QWGSG2GUG -p tcp -m comment --comment "ingress-nginx/ingress-nginx:https" -m tcp -j DNAT --to-destination 192.168.103.142:443
  250. control-plane-mjnp4:/home/eccd #
  251. control-plane-mjnp4:/home/eccd # iptables -t nat -S|grep KUBE-SEP-JRQUMAVVXJLRS3O3 | grep -v "j KUBE-SEP-JRQUMAVVXJLRS3O3"
  252. -N KUBE-SEP-JRQUMAVVXJLRS3O3
  253. -A KUBE-SEP-JRQUMAVVXJLRS3O3 -s 192.168.107.128/32 -m comment --comment "ingress-nginx/ingress-nginx:https" -j KUBE-MARK-MASQ
  254. -A KUBE-SEP-JRQUMAVVXJLRS3O3 -p tcp -m comment --comment "ingress-nginx/ingress-nginx:https" -m tcp -j DNAT --to-destination 192.168.107.128:443
  255. control-plane-mjnp4:/home/eccd #
  256. control-plane-mjnp4:/home/eccd # iptables -t nat -S|grep KUBE-SEP-ILQWMWXVY6HNWZZT | grep -v "j KUBE-SEP-ILQWMWXVY6HNWZZT"
  257. -N KUBE-SEP-ILQWMWXVY6HNWZZT
  258. -A KUBE-SEP-ILQWMWXVY6HNWZZT -s 192.168.176.202/32 -m comment --comment "ingress-nginx/ingress-nginx:https" -j KUBE-MARK-MASQ
  259. -A KUBE-SEP-ILQWMWXVY6HNWZZT -p tcp -m comment --comment "ingress-nginx/ingress-nginx:https" -m tcp -j DNAT --to-destination 192.168.176.202:443
  260. control-plane-mjnp4:/home/eccd #
  261. control-plane-mjnp4:/home/eccd # iptables -t nat -S|grep KUBE-SEP-IM3OJF4BZQHSU4FK | grep -v "j KUBE-SEP-IM3OJF4BZQHSU4FK"
  262. -N KUBE-SEP-IM3OJF4BZQHSU4FK
  263. -A KUBE-SEP-IM3OJF4BZQHSU4FK -s 192.168.95.125/32 -m comment --comment "ingress-nginx/ingress-nginx:https" -j KUBE-MARK-MASQ
  264. -A KUBE-SEP-IM3OJF4BZQHSU4FK -p tcp -m comment --comment "ingress-nginx/ingress-nginx:https" -m tcp -j DNAT --to-destination 192.168.95.125:443
  265. control-plane-mjnp4:/home/eccd #
  266. Question, who owns those IP?
  267. 2) Traffic will be sent to one of nginx ingress controller pods
  268. <--------Second load balancing by NGINX - Layer 7 -------->
  269. nginx-ingress-controller will load balance the HTTP traffic e.g. based on the URL.
  270. eccd@control-plane-mjnp4:~> kubectl get po -n ingress-nginx -l app=ingress-nginx -o=custom-columns=NODE:.spec.nodeName,Name:.metadata.name,hostIP:.status.hostIP,podIP:.status.podIP
  271. NODE Name hostIP podIP
  272. pool1-868fc5689d-9m7m5 nginx-ingress-controller-5dff4dcd48-dfbnr 10.0.10.102 192.168.107.128
  273. control-plane-gwnnh nginx-ingress-controller-5dff4dcd48-m69kc 10.0.10.109 192.168.95.125
  274. control-plane-cr2zh nginx-ingress-controller-5dff4dcd48-rcgv8 10.0.10.110 192.168.176.202
  275. pool2-6859c57999-2t8hx nginx-ingress-controller-5dff4dcd48-t8v2g 10.0.10.108 192.168.103.142
  276. eccd@control-plane-mjnp4:~>
  277. 3) Enter NGINX ingress controller pod and check HTTP load balancing rules - traffic will be sent to service monitoring/eric-pm-server:9090:
  278. eccd@control-plane-mjnp4:~> kubectl exec nginx-ingress-controller-5dff4dcd48-dfbnr -n ingress-nginx -ti -- bash
  279. bash-5.0$ cat nginx.conf
  280. # Configuration checksum: 5612144526815823684
  281. upstream upstream_balancer {
  282. ### Attention!!!
  283. #
  284. # We no longer create "upstream" section for every backend.
  285. # Backends are handled dynamically using Lua. If you would like to debug
  286. # and see what backends ingress-nginx has in its memory you can
  287. # install our kubectl plugin https://kubernetes.github.io/ingress-nginx/kubectl-plugin.
  288. # Once you have the plugin you can use "kubectl ingress-nginx backends" command to
  289. # inspect current backends.
  290. #
  291. ###
  292. ## start server dc173prometheus1.cloud.k2.ericsson.se
  293. server {
  294. server_name dc173prometheus1.cloud.k2.ericsson.se ;
  295. listen 80 ;
  296. listen 442 proxy_protocol ssl http2 ;
  297. set $proxy_upstream_name "-";
  298. ssl_certificate_by_lua_block {
  299. certificate.call()
  300. }
  301. location / {
  302. set $namespace "monitoring";
  303. set $ingress_name "prometheus-server";
  304. set $service_name "eric-pm-server";
  305. set $service_port "9090";
  306. set $location_path "/";
  307. rewrite_by_lua_block {
  308. lua_ingress.rewrite({
  309. force_ssl_redirect = false,
  310. ssl_redirect = true,
  311. force_no_ssl_redirect = false,
  312. use_port_in_redirects = false,
  313. })
  314. balancer.rewrite()
  315. plugins.run()
  316. }
  317. # be careful with `access_by_lua_block` and `satisfy any` directives as satisfy any
  318. # will always succeed when there's `access_by_lua_block` that does not have any lua code doing `ngx.exit(ngx.DECLINED)`
  319. # other authentication method such as basic auth or external auth useless - all requests will be allowed.
  320. #access_by_lua_block {
  321. #}
  322. header_filter_by_lua_block {
  323. lua_ingress.header()
  324. plugins.run()
  325. }
  326. body_filter_by_lua_block {
  327. }
  328. log_by_lua_block {
  329. balancer.log()
  330. monitor.call()
  331. plugins.run()
  332. }
  333. port_in_redirect off;
  334. set $balancer_ewma_score -1;
  335. set $proxy_upstream_name "monitoring-eric-pm-server-9090";
  336. set $proxy_host $proxy_upstream_name;
  337. set $pass_access_scheme $scheme;
  338. set $pass_server_port $server_port;
  339. set $best_http_host $http_host;
  340. set $pass_port $pass_server_port;
  341. set $proxy_alternative_upstream_name "";
  342. client_max_body_size 1m;
  343. proxy_set_header Host $best_http_host;
  344. # Pass the extracted client certificate to the backend
  345. # Allow websocket connections
  346. proxy_set_header Upgrade $http_upgrade;
  347. proxy_set_header Connection $connection_upgrade;
  348. proxy_set_header X-Request-ID $req_id;
  349. proxy_set_header X-Real-IP $remote_addr;
  350. proxy_set_header X-Forwarded-For $remote_addr;
  351. proxy_set_header X-Forwarded-Host $best_http_host;
  352. proxy_set_header X-Forwarded-Port $pass_port;
  353. proxy_set_header X-Forwarded-Proto $pass_access_scheme;
  354. proxy_set_header X-Scheme $pass_access_scheme;
  355. # Pass the original X-Forwarded-For
  356. proxy_set_header X-Original-Forwarded-For $http_x_forwarded_for;
  357. # mitigate HTTPoxy Vulnerability
  358. # https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
  359. proxy_set_header Proxy "";
  360. # Custom headers to proxied server
  361. proxy_connect_timeout 5s;
  362. proxy_send_timeout 60s;
  363. proxy_read_timeout 60s;
  364. proxy_buffering off;
  365. proxy_buffer_size 4k;
  366. proxy_buffers 4 4k;
  367. proxy_max_temp_file_size 1024m;
  368. proxy_request_buffering on;
  369. proxy_http_version 1.1;
  370. proxy_cookie_domain off;
  371. proxy_cookie_path off;
  372. # In case of errors try the next upstream server before returning an error
  373. proxy_next_upstream error timeout;
  374. proxy_next_upstream_timeout 0;
  375. proxy_next_upstream_tries 3;
  376. proxy_pass http://upstream_balancer;
  377. proxy_redirect off;
  378. }
  379. }
  380. ## end server dc173prometheus1.cloud.k2.ericsson.se
  381. 4) Check service eric-pm-server definition
  382. eccd@control-plane-wfb88:~> kubectl get ing prometheus-server -n monitoring -o yaml
  383. apiVersion: extensions/v1beta1
  384. kind: Ingress
  385. metadata:
  386. ...
  387. spec:
  388. rules:
  389. - host: dc173prometheus1.cloud.k2.ericsson.se
  390. http:
  391. paths:
  392. - backend:
  393. serviceName: eric-pm-server
  394. servicePort: 9090
  395. path: /
  396. pathType: ImplementationSpecific
  397. status:
  398. loadBalancer:
  399. ingress:
  400. - ip: 10.0.10.102
  401. - ip: 10.0.10.103
  402. - ip: 10.0.10.104
  403. - ip: 10.0.10.108
  404. 5) Traffic then will be forwarded to service clusterIP 10.107.185.250.
  405. eccd@control-plane-mjnp4:~> kubectl get svc eric-pm-server -n monitoring -o yaml
  406. spec:
  407. clusterIP: 10.107.185.250
  408. ports:
  409. - name: http
  410. port: 9090
  411. protocol: TCP
  412. targetPort: 9090
  413. selector:
  414. app: eric-pm-server
  415. component: server
  416. release: pm-server
  417. sessionAffinity: None
  418. type: ClusterIP
  419. status:
  420. loadBalancer: {}
  421. control-plane-mjnp4:/home/eccd # iptables -t nat -S | grep 10.107.185.250
  422. -A KUBE-SERVICES ! -s 192.168.0.0/16 -d 10.107.185.250/32 -p tcp -m comment --comment "monitoring/eric-pm-server:http cluster IP" -m tcp --dport 9090 -j KUBE-MARK-MASQ
  423. -A KUBE-SERVICES -d 10.107.185.250/32 -p tcp -m comment --comment "monitoring/eric-pm-server:http cluster IP" -m tcp --dport 9090 -j KUBE-SVC-6K4R6JTKTVNBU5UP
  424. control-plane-mjnp4:/home/eccd #
  425. 6) First rule in chain KUBE-SERVICES is to set mark in KUBE-MARK-MASQ
  426. Second rule in chain KUBE-SERVICES will forward traffic to chain KUBE-SVC-6K4R6JTKTVNBU5UP
  427. control-plane-mjnp4:/home/eccd # iptables -t nat -S | grep KUBE-SVC-6K4R6JTKTVNBU5UP | grep -v "j KUBE-SVC-6K4R6JTKTVNBU5UP"
  428. -N KUBE-SVC-6K4R6JTKTVNBU5UP
  429. -A KUBE-SVC-6K4R6JTKTVNBU5UP -m comment --comment "monitoring/eric-pm-server:http" -j KUBE-SEP-CNB2ICYJPHYJI3AY
  430. control-plane-mjnp4:/home/eccd #
  431. 7) traffic enters chain KUBE-SEP-CNB2ICYJPHYJI3AY, will be sent to 192.168.22.247 which is pod IP.
  432. control-plane-mjnp4:/home/eccd # iptables -t nat -S | grep KUBE-SEP-CNB2ICYJPHYJI3AY | grep -v "j KUBE-SEP-CNB2ICYJPHYJI3AY"
  433. -N KUBE-SEP-CNB2ICYJPHYJI3AY
  434. -A KUBE-SEP-CNB2ICYJPHYJI3AY -s 192.168.22.247/32 -m comment --comment "monitoring/eric-pm-server:http" -j KUBE-MARK-MASQ
  435. -A KUBE-SEP-CNB2ICYJPHYJI3AY -p tcp -m comment --comment "monitoring/eric-pm-server:http" -m tcp -j DNAT --to-destination 192.168.22.247:9090
  436. control-plane-mjnp4:/home/eccd #
  437. eccd@control-plane-mjnp4:~> kubectl get po -l app=eric-pm-server -A -o wide
  438. NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
  439. evnfm eric-pm-server-0 2/2 Running 0 4h29m 192.168.188.101 pool2-6859c57999-mzv2d <none> <none>
  440. monitoring eric-pm-server-0 2/2 Running 0 2d1h 192.168.22.247 pool2-6859c57999-7hfs2 <none> <none>
  441. eccd@control-plane-mjnp4:~>
  442. Since the server in question is in monitoring namespace, the correct PM server shall be the second one.
  443. <--------Third load balancing by K8s service - round-robin using iptables mode-------->
  444. There is only one eric-pm-server pod so there is no round-robin load balancing.
  445. eccd@control-plane-mjnp4:~> kubectl get no -o wide | grep pool2-6859c57999-7hfs2
  446. pool2-6859c57999-7hfs2 Ready worker 19d v1.18.8 10.0.10.105 <none> SUSE Linux Enterprise Server 15 SP1 4.12.14-197.51-default docker://19.3.11
  447. eccd@control-plane-mjnp4:~> ssh 10.0.10.105 -q
  448. Last login: Mon Sep 21 16:27:26 2020 from 10.0.10.101
  449. eccd@pool2-6859c57999-7hfs2:~> sudo su
  450. pool2-6859c57999-7hfs2:/home/eccd # iptables -t nat -S | grep 192.168.22.247
  451. -A KUBE-SEP-CNB2ICYJPHYJI3AY -s 192.168.22.247/32 -m comment --comment "monitoring/eric-pm-server:http" -j KUBE-MARK-MASQ
  452. -A KUBE-SEP-CNB2ICYJPHYJI3AY -p tcp -m comment --comment "monitoring/eric-pm-server:http" -m tcp -j DNAT --to-destination 192.168.22.247:9090
  453. pool2-6859c57999-7hfs2:/home/eccd #
  454. Traffic enters the correct pod and we will now look at the return path.
  455. iii. Traffic will be set to mask 0x8000 if not matched by any of above rules.
  456. control-plane-mjnp4:/home/eccd # iptables -t nat -S|grep KUBE-MARK-DROP | grep -v "j KUBE-MARK-DROP"
  457. -N KUBE-MARK-DROP
  458. -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
  459. control-plane-mjnp4:/home/eccd #
  460. eccd@control-plane-wfb88:~> kubectl get ing -A
  461. NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE
  462. monitoring prometheus-server <none> dc173prometheus1.cloud.k2.ericsson.se 10.0.10.102,10.0.10.103,10.0.10.104,10.0.10.108 80 28d

==============

return path

Return path will be using default route if there is no explicit route or existing route for dst=10.210.150.87.
1. Start point - return traffic with source IP 192.168.22.247/32 will be sent to KUBE-MARK-MASQ chain
pool2-6859c57999-7hfs2:/home/eccd # iptables -t nat -S | grep 192.168.22.247
-A KUBE-SEP-CNB2ICYJPHYJI3AY -s 192.168.22.247/32 -m comment —comment “monitoring/eric-pm-server:http” -j KUBE-MARK-MASQ
-A KUBE-SEP-CNB2ICYJPHYJI3AY -p tcp -m comment —comment “monitoring/eric-pm-server:http” -m tcp -j DNAT —to-destination 192.168.22.247:9090
pool2-6859c57999-7hfs2:/home/eccd #
2. All return traffic are set with mask 0x4000.
A KUBE-MARK-MASQ -j MARK —set-xmark 0x4000/0x4000

Chain KUBE-MARK-MASQ (242 references)
target     prot opt source               destination         
MARK       all  --  anywhere             anywhere             MARK or 0x4000
3. traffic with UBE-MARK-MASQ mark will return to chain CNB2ICYJPHYJI3AY and check what's next rule - no more rules in chain KUBE-SEP-CNB2ICYJPHYJI3AY
pool2-6859c57999-7hfs2:/home/eccd # iptables -t nat -S | grep KUBE-SEP-CNB2ICYJPHYJI3AY 
-N KUBE-SEP-CNB2ICYJPHYJI3AY
-A KUBE-SEP-CNB2ICYJPHYJI3AY -s 192.168.22.247/32 -m comment --comment "monitoring/eric-pm-server:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-CNB2ICYJPHYJI3AY -p tcp -m comment --comment "monitoring/eric-pm-server:http" -m tcp -j DNAT --to-destination 192.168.22.247:9090
-A KUBE-SVC-6K4R6JTKTVNBU5UP -m comment --comment "monitoring/eric-pm-server:http" -j KUBE-SEP-CNB2ICYJPHYJI3AY
pool2-6859c57999-7hfs2:/home/eccd #
4. KUBE-MARK-MASQ adds a Netfilter mark to packets originated from the eric-pm-serve service which destined outside the cluster’s network. Packets with this mark will be altered in a POSTROUTING rule to use source network address translation (SNAT) with the node’s IP address as their source IP address.

From <https://www.stackrox.com/post/2020/01/kubernetes-networking-demystified/> 

5. Destination IP is not a the explicit routing entries, use default route
pool2-6859c57999-7hfs2:/home/eccd # ip r | grep 10.210.150.87
pool2-6859c57999-7hfs2:/home/eccd # 
pool2-6859c57999-7hfs2:/home/eccd # ip r | grep default
default via 10.33.152.65 dev ecfe_traf1 proto static metric 804 
pool2-6859c57999-7hfs2:/home/eccd # 
pool2-6859c57999-7hfs2:/home/eccd # ip a show ecfe_traf1
12: ecfe_traf1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 36:1f:d1:91:3a:16 brd ff:ff:ff:ff:ff:ff
    inet 10.33.152.74/27 brd 10.33.152.95 scope global noprefixroute ecfe_traf1
       valid_lft forever preferred_lft forever
pool2-6859c57999-7hfs2:/home/eccd # 


At this stage, IP-SA= 10.33.152.74, IP-DA=10.210.150.87. 
Since the packet ingress at DC173_CCD1_om_vr, the egress traffic must go back to the same VRF.
CNIS 1.0 default route belongs to application, therefore the egress traffic shall be forwarded via ccd_ecfe_om interface.

Ingress and egress traffic are routed via different VRF, TCP session won't be established.
Add route(temporary workaround, not persistent):
for i in `kubectl get no -o json | jq -r '.items[].status.addresses[] | select(.type=="InternalIP") | .address'` ; do ssh $i -q sudo ip route add <web-browser-ip>/32 via <ccd_ecfe-SLX-om_vr-ve_anycast> ; done

Egress traffic will be
Prometheus clusterip(iptables+conntrack) -> nginx ingress controller pod(-> srcIP=ingress-pod-ip) -> server host iptables hosting nginx controller pod(-> srcIP=ingress-ip) > lookup its host OS routing table -> ecfe gw(anycast GW IP in SLX) -> 10.210.150.87
src-IP = Promethers pod