ingress-nginx用了lua以后,完全依赖于endpoint摘除故障的podIP,然后从nginx的upstream中去除。当节点故障后,40s后节点notReady,pod被nodelifecycle controller置为False(Ready condition),然后endpoint controller摘除ep;这段时间略长,或者ep中的IP不是k8s里的IP,而是手动加的宿主机上部署的服务IP,此时不能完全依赖于k8s的机制。所以lua中没有了max_fails和主动check的配置后,会影响业务。

    下面是ingress-nginx上的issue讨论,以及使用proxy-next-upstream解决?

    delete upstream healthcheck annotation:
    https://github.com/kubernetes/ingress-nginx/pull/3207
    proxy-next-upstream is not working:
    https://github.com/kubernetes/ingress-nginx/issues/4567
    Kubernetes ingress-nginx 4 层 tcp 代理,无限重试不存在的地址,高达百万次:
    https://www.lijiaocn.com/%E9%97%AE%E9%A2%98/2019/09/17/ingress-nginx-l4-proxy.html
    Custom NGINX upstream checks:
    https://github.com/kubernetes/ingress-nginx/blob/b46523a1f47ddf34caec3d87574b17dd9c6ea764/docs/user-guide/nginx-configuration/annotations.md#custom-nginx-upstream-checks
    nginx的主动和被动检查:
    https://www.cnblogs.com/linyouyi/p/11502282.html