k8s采坑记录之CoreDns

k8s k8s

创建时间:2019-12-24 20:20

字数:3k 阅读:

现象
排查过程

现象

最近在部署开发环境的时候,发现POD的READY状态持续了大概7分钟一直都是0/1

1
2

NAME                                       READY   STATUS    RESTARTS   AGE   IP            NODE                                                NOMINATED NODE   READINESS GATES
ks-sso-server-deployment-896964cb6-9xdnb   0/1     Running   0          6m36s   10.244.1.30   ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02   <none>           <none>

排查过程

通过describe查看健康检测接口”connection refused” 应该是pod的服务没有完全起来，readiness才没有检测通过

kubectl describe pod/ks-sso-server-deployment-896964cb6-9xdnb

Events:
  Type     Reason     Age               From                                                        Message
  ----     ------     ----              ----                                                        -------
  Normal   Scheduled  <unknown>         default-scheduler                                           Successfully assigned default/ks-sso-server-deployment-896964cb6-9xdnb to ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02
  Normal   Pulling    78s               kubelet, ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02  Pulling image "busybox"
  Normal   Pulled     69s               kubelet, ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02  Successfully pulled image "busybox"
  Normal   Created    69s               kubelet, ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02  Created container proj-init
  Normal   Started    69s               kubelet, ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02  Started container proj-init
  Normal   Pulled     68s               kubelet, ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02  Container image "harbor.x.xxx.com/library/ks-sso-server:1.6.0.0-SNAPSHOT" already present on machine
  Normal   Created    68s               kubelet, ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02  Created container ks-sso-server
  Normal   Started    68s               kubelet, ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02  Started container ks-sso-server
  Warning  Unhealthy  3s (x8 over 38s)  kubelet, ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02  Readiness probe failed: Get http://10.244.1.30:9999/actuator/health: dial tcp 10.244.1.30:9999: connect: connection refused

根据历史经验判断应该是业务服务在启动过程中一直在等待连接数据库，尝试在k8s node节点上连接数据库可以正常访问，在pod中尝试ping外网域名解析失败

1 2	bash-4.2$ ping -c 1 www.baidu.com ping: www.baidu.com: Name or service not known

查找dns相关信息

# pod内执行
bash-4.2$ cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

# node上执行
root@pts/1 $ kubectl get svc -n kube-system -o wide | grep 10.96.0.10
kube-dns              ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   33d   k8s-app=kube-dns

# 查找对应的POD
root@pts/1 $ kubectl get pods -n kube-system -o wide -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE                                                NOMINATED NODE   READINESS GATES
coredns-5f8cbd7dcb-7d4tq   1/1     Running   0          26h   10.244.1.28   ecs.ali-bj-vpc.other.172.25.116.185.vpc-dev-k8s02   <none>           <none>
coredns-5f8cbd7dcb-sq4kf   1/1     Running   0          33d   10.244.2.2    ecs.ali-bj-vpc.other.172.25.116.184.vpc-dev-k8s03   <none>           <none>

# 分别使用对应的ip解析域名
root@pts/1 $ dig @10.244.1.28 www.baidu.com

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> @10.244.1.28 www.baidu.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42941
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.baidu.com.			IN	A

;; ANSWER SECTION:
www.baidu.com.		30	IN	CNAME	www.a.shifen.com.
www.a.shifen.com.	30	IN	A	220.181.38.149
www.a.shifen.com.	30	IN	A	220.181.38.150

;; Query time: 2 msec
;; SERVER: 10.244.1.28#53(10.244.1.28)
;; WHEN: Tue Dec 24 20:48:52 CST 2019
;; MSG SIZE  rcvd: 149

ecs.ali-bj-vpc.other.172.25.116.186.vpc-dev-k8s01 [~] 2019-12-24 20:48:52
root@pts/1 $ dig @10.244.2.2 www.baidu.com

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> @10.244.2.2 www.baidu.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3294
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.baidu.com.			IN	A

;; ANSWER SECTION:
www.baidu.com.		22	IN	CNAME	www.a.shifen.com.
www.a.shifen.com.	22	IN	A	220.181.38.150
www.a.shifen.com.	22	IN	A	220.181.38.149

;; Query time: 2 msec
;; SERVER: 10.244.2.2#53(10.244.2.2)
;; WHEN: Tue Dec 24 20:49:15 CST 2019
;; MSG SIZE  rcvd: 149

直接通过CoreDns解析都没有问题，访问service是通过iptable的转发规则实现的，难道iptable的转发规则有问题？

大概的转发规则是这样：
                                                                |-->KUBE-SEP-MNEVT5LK3OWCRPXW---|       |--->10.244.1.28(50%)
请求-->OUTPUT-->KUBE-SERVICES-->KUBE-SVC-TCOU7JCQXEZGVUNU-->RANDOM                                --->路由
                                                                |-->KUBE-SEP-TCIZBYBD3WWXNWF5---|       |--->10.244.2.2(50%)

# 这里只展示相关的规则
root@pts/0 $ iptables -t nat -vnL
Chain PREROUTING (policy ACCEPT 17 packets, 2178 bytes)
 pkts bytes target     prot opt in     out     source               destination
 393K   59M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
 138K   40M DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 8 packets, 1243 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 16 packets, 2436 bytes)
 pkts bytes target     prot opt in     out     source               destination
1138K   81M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
    0     0 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL
    
 Chain KUBE-SERVICES (2 references)
 pkts bytes target     prot opt in     out     source               destination   
    0     0 KUBE-MARK-MASQ  udp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
    0     0 KUBE-SVC-TCOU7JCQXEZGVUNU  udp  --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
    0     0 KUBE-SVC-ERIFXISQEP7F7OF4  tcp  --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
    
Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 KUBE-SEP-MNEVT5LK3OWCRPXW  all  --  *      *       0.0.0.0/0            0.0.0.0/0            statistic mode random probability 0.50000000000
    0     0 KUBE-SEP-TCIZBYBD3WWXNWF5  all  --  *      *       0.0.0.0/0            0.0.0.0/0
    
 Chain KUBE-SEP-MNEVT5LK3OWCRPXW (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.1.28          0.0.0.0/0
    0     0 DNAT       udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp to:10.244.1.28:53

Chain KUBE-SEP-TCIZBYBD3WWXNWF5 (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.2.2           0.0.0.0/0
    0     0 DNAT       udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp to:10.244.2.2:53

上面的规则也是没有问题的，郁闷了，看来只能祭出抓包神器tcpdump分析了

# pod上执行
bash-4.2$ ping -c 1 www.baidu.com
ping: www.baidu.com: Name or service not known

# node上执行
tcpdump -i cni0  -p udp port 53  and host 10.244.2.55 -vv -a
21:30:57.993070 IP (tos 0x0, ttl 64, id 14287, offset 0, flags [DF], proto UDP (17), length 85)
    10.244.2.55.51209 > 10.96.0.10.domain: [bad udp cksum 0x17e7 -> 0x89b4!] 606+ A? www.baidu.com.default.svc.cluster.local. (57)
21:30:57.993102 IP (tos 0x0, ttl 63, id 14287, offset 0, flags [DF], proto UDP (17), length 85)
    10.244.2.55.51209 > 10.244.2.2.domain: [bad udp cksum 0x1a73 -> 0x8728!] 606+ A? www.baidu.com.default.svc.cluster.local. (57)
######
    这里停顿了5s    
######
21:31:02.998159 IP (tos 0x0, ttl 64, id 18413, offset 0, flags [DF], proto UDP (17), length 85)
    10.244.2.55.51209 > 10.96.0.10.domain: [bad udp cksum 0x17e7 -> 0x89b4!] 606+ A? www.baidu.com.default.svc.cluster.local. (57)
21:31:02.998189 IP (tos 0x0, ttl 63, id 18413, offset 0, flags [DF], proto UDP (17), length 85)
    10.244.2.55.51209 > 10.244.2.2.domain: [bad udp cksum 0x1a73 -> 0x8728!] 606+ A? www.baidu.com.default.svc.cluster.local. (57)
######
    这里停顿了5s    
######
21:31:08.002208 IP (tos 0x0, ttl 64, id 22190, offset 0, flags [DF], proto UDP (17), length 59)
    10.244.2.55.48623 > 10.96.0.10.domain: [bad udp cksum 0x17cd -> 0xcdb6!] 41007+ A? www.baidu.com. (31)
21:31:08.005166 IP (tos 0x0, ttl 62, id 57825, offset 0, flags [DF], proto UDP (17), length 166)
    10.96.0.10.domain > 10.244.2.55.48623: [udp sum ok] 41007 q: A? www.baidu.com. 3/0/0 www.baidu.com. CNAME www.a.shifen.com., www.a.shifen.com. A 220.181.38.150, www.a.shifen.com. A 220.181.38.149 (138)
21:31:08.012680 IP (tos 0x0, ttl 64, id 22196, offset 0, flags [DF], proto UDP (17), length 73)
    10.244.2.55.35870 > 10.96.0.10.domain: [bad udp cksum 0x17db -> 0xa4f7!] 12939+ PTR? 150.38.181.220.in-addr.arpa. (45)
21:31:08.015056 IP (tos 0x0, ttl 62, id 57832, offset 0, flags [DF], proto UDP (17), length 171)
    10.96.0.10.domain > 10.244.2.55.35870: [udp sum ok] 12939 NXDomain q: PTR? 150.38.181.220.in-addr.arpa. 0/1/0 ns: 38.181.220.IN-ADDR.ARPA. SOA idc-ns1.bjtelecom.net. wang_ye.bjxywh.com. 1201938454 10800 3600 604800 38400 (143)
    
# 正常pod里返回的应该是像下面这样    
23:21:04.525768 IP (tos 0x0, ttl 64, id 10315, offset 0, flags [DF], proto UDP (17), length 85)
    10.244.0.2.58318 > 10.96.0.10.domain: [bad udp cksum 0x15b2 -> 0x4c30!] 9810+ A? www.baidu.com.default.svc.cluster.local. (57)
23:21:04.526224 IP (tos 0x0, ttl 62, id 16446, offset 0, flags [DF], proto UDP (17), length 178)
    10.96.0.10.domain > 10.244.0.2.58318: [udp sum ok] 9810 NXDomain*- q: A? www.baidu.com.default.svc.cluster.local. 0/1/0 ns: cluster.local. [30s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1577200182 7200 1800 86400 30 (150)
23:21:04.526312 IP (tos 0x0, ttl 64, id 10316, offset 0, flags [DF], proto UDP (17), length 77)
    10.244.0.2.52627 > 10.96.0.10.domain: [bad udp cksum 0x15aa -> 0xb683!] 3326+ A? www.baidu.com.svc.cluster.local. (49)
23:21:04.526617 IP (tos 0x0, ttl 62, id 16447, offset 0, flags [DF], proto UDP (17), length 170)
    10.96.0.10.domain > 10.244.0.2.52627: [udp sum ok] 3326 NXDomain*- q: A? www.baidu.com.svc.cluster.local. 0/1/0 ns: cluster.local. [30s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1577200182 7200 1800 86400 30 (142)
23:21:04.526665 IP (tos 0x0, ttl 64, id 10317, offset 0, flags [DF], proto UDP (17), length 73)
    10.244.0.2.35606 > 10.96.0.10.domain: [bad udp cksum 0x15a6 -> 0xe2fb!] 40161+ A? www.baidu.com.cluster.local. (45)
23:21:04.527091 IP (tos 0x0, ttl 62, id 12183, offset 0, flags [DF], proto UDP (17), length 166)
    10.96.0.10.domain > 10.244.0.2.35606: [udp sum ok] 40161 NXDomain*- q: A? www.baidu.com.cluster.local. 0/1/0 ns: cluster.local. [30s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1577200182 7200 1800 86400 30 (138)
23:21:04.527127 IP (tos 0x0, ttl 64, id 10318, offset 0, flags [DF], proto UDP (17), length 59)
    10.244.0.2.47739 > 10.96.0.10.domain: [bad udp cksum 0x1598 -> 0x2b05!] 18570+ A? www.baidu.com. (31)
23:21:04.529487 IP (tos 0x0, ttl 62, id 16448, offset 0, flags [DF], proto UDP (17), length 166)
    10.96.0.10.domain > 10.244.0.2.47739: [udp sum ok] 18570 q: A? www.baidu.com. 3/0/0 www.baidu.com. [30s] CNAME www.a.shifen.com., www.a.shifen.com. [30s] A 220.181.38.150, www.a.shifen.com. [30s] A 220.181.38.149 (138)
23:21:04.537253 IP (tos 0x0, ttl 64, id 10325, offset 0, flags [DF], proto UDP (17), length 73)
    10.244.0.2.35284 > 10.96.0.10.domain: [bad udp cksum 0x15a6 -> 0x7998!] 25193+ PTR? 150.38.181.220.in-addr.arpa. (45)
23:21:04.539122 IP (tos 0x0, ttl 62, id 12187, offset 0, flags [DF], proto UDP (17), length 171)
    10.96.0.10.domain > 10.244.0.2.35284: [udp sum ok] 25193 NXDomain q: PTR? 150.38.181.220.in-addr.arpa. 0/1/0 ns: 38.181.220.IN-ADDR.ARPA. [30s] SOA idc-ns1.bjtelecom.net. wang_ye.bjxywh.com. 1201938454 10800 3600 604800 38400 (143)

# 如果我按照下面这个方式访问就能很快返回结果
bash-4.2$ ping www.baidu.com.   <--注意这里是"."结尾，意思就是域名的绝对路径进行dns查询
PING www.a.shifen.com (220.181.38.149) 56(84) bytes of data.
64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=1 ttl=51 time=6.97 ms
64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=2 ttl=51 time=6.93 ms
64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=3 ttl=51 time=7.08 ms
^C
--- www.a.shifen.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 6.930/6.996/7.082/0.093 ms

综合上面查证问题一共有2个

网络存在丢包问题，这个问题经过各种查证最后通过重启解决了，可能和最近阿里云最近漏洞修复有关，太坑爹了

存在无效查询，因为 /etc/resolv.conf里存在默认搜索域”search default.svc.cluster.local svc.cluster.local cluster.local”，所以查询一个域名的时候就把将查询的域名依次添加默认搜索的域名进行查询后，才会以查询你想要查的域名，解决方法如下：

在deploy.yml配置的containers里添加下面配置
dnsConfig:
  options:
  - name: ndots
    value: "1"
    
    
ndots指的的是域名中包含"."的个数，如果少于这个数量k8s则认为这个域名是一个相对路径，就会走search对应的域名。在 Kubernetes 中，默认设置了 ndots 值为5，是因为，Kubernetes 认为，内部域名，最长为5，要保证内部域名的请求，优先走集群内部的DNS，而不是将内部域名的DNS解析请求，有打到外网的机会，Kubernetes 设置 ndots 为5是一个比较合理的行为。

至此，这个问题算是解决了，丢包那个问题真是坑爹。。。。

转载请注明来源，欢迎对文章中的引用来源进行考证，欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论，也可以邮件至 jaytp@qq.com