Recently, I was attempting to diagnose why the kube-apiserver was not connecting to an etcd cluster for a path override. I exec’ed into the pod and installed and ran tcpdump and saw no attempt at all to even establish a connection with that server. This caused me to spend a bunch of time following a red herring, thinking there was a problem with the path override code.

It turned out, however, that there was no problem with the path override code. Instead (what I still consider to be a bug) it only attempts to make the connection when it first starts up but never tries again.

In order to catch that attempt, I needed to run tcpdump much earlier than I could possibly exec in to the running pod. The solution I found was to install a sidecar container that ran tcpdump.

apiVersion: v1
kind: Pod
metadata: { ... }
spec:
  containers:
    - name: tcpdump
      image: corfr/tcpdump
      command:
        - /usr/sbin/tcpdump
        - host
        - etcdEvents.etcd
    - name: kube-apiserver
      image: gcr.io/google_containers/kube-apiserver:v1.9.1
      command: [ ... ]
      ...

This allowed me to capture any traffic destined for that host and see that it was, indeed, trying to connect once (and only once ever).

Turned out the problem was actually that the installation tool I was using had changed the hostname of the etcd nodes and I worked around it with a CNAME, forgetting to change the ssl certificate to match the new name.

Still, using a sidecar to load a diagnostic tool can be quite a handy tool to keep in your arsenal.