Docker Service Troubleshooting

Overview

Sometimes the command ‘kubectl get node’ shows that one node is in the ‘NotReady’ state:

# kubectl get node
NAME           STATUS     ROLES    AGE   VERSION
10.X1.Y1.Z1    Ready      node     47d   v1.13.9
10.X2.Y2.Z2    NotReady   node     47d   v1.13.9
10.X3.Y3.Z3    Ready      master   47d   v1.13.9

This may mean that docker may not work on this node. This article is about how to detect and fix docker issues.

Checking Docker Status - Working Fine Scenario

  1. Check docker.service status (if systemd is used):

     # systemctl status docker.service
     ● docker.service - Docker Application Container Engine
       Loaded: loaded (/lib/systemd/system/docker.service; disabled; vendor preset: enabled)
       Drop-In: /etc/systemd/system/docker.service.d
                └─70-kublr-override.conf
       Active: active (running) since Mon 2019-11-11 12:05:12 UTC; 2 days ago=
         Docs: https://docs.docker.com
        Main PID: 8609 (dockerd)
         Tasks: 21
        Memory: 1.0G
          CPU: 1h 35min 16.168s
        CGroup: /system.slice/docker.service
                └─8609 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --config-file=/etc/docker/kublr-daemon.json
     

    This output shows that docker.service is running.

  2. Check dockerd process is running:

     # ps -ef |grep dockerd
     root      1278     1  0 окт07 ?     00:06:08 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
     

    Docker is running and it has process ID 1278.

  3. Check dockerd can communicate through domain socket /var/run/docker.sock:

     # docker version
     docker version
      Client:
       Version:           18.09.7
       API version:       1.39
       Go version:        go1.10.4
       Git commit:        2d0083d
       Built:             Fri Aug 16 14:19:38 2019
       OS/Arch:           linux/amd64
       Experimental:      false
    
       Server:
        Engine:
         Version:          18.09.7
         API version:      1.39 (minimum version 1.12)
         Go version:       go1.10.4
         Git commit:       2d0083d
         Built:            Thu Aug 15 15:12:41 2019
         OS/Arch:          linux/amd64
         Experimental:     false
     
  4. Check that docker can run containers:

     # docker run hello-world
     Hello from Docker!
     This message shows that your installation appears to be working correctly.
    
     To generate this message, Docker took the following steps:
      1. The Docker client contacted the Docker daemon.
      2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
         (amd64)
      3. The Docker daemon created a new container from that image which runs the
         executable that produces the output you are currently reading.
      4. The Docker daemon streamed that output to the Docker client, which sent it
         to your terminal.
      ....
     

    If all the commands executed successfully then most likely the issue is not in the docker.

Checking Docker Status - Not or Incorrect Working Scenario

  1. Docker is not installed:

     # systemctl status docker
     ● docker.service
        Loaded: not-found (Reason: No such file or directory)
        Active: inactive (dead)
     
  2. Docker is not started

     # sudo systemctl status docker
     ● docker.service - Docker Application Container Engine
        Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
        Active: inactive (dead) since Чт 2019-11-14 19:58:48 +07; 20s ago
          Docs: https://docs.docker.com
      Main PID: 1371 (code=exited, status=0/SUCCESS)
    
     and:
    
     # ps -ef |grep dockerd
     #
     
  3. Docker cannot communicate through domain socket /var/run/docker.sock

     docker version
     Client:
      Version:           18.09.7
      API version:       1.39
      Go version:        go1.10.4
      Git commit:        2d0083d
      Built:             Fri Aug 16 14:19:38 2019
      OS/Arch:           linux/amd64
      Experimental:      false
     Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
     
  4. Docker cannot communicate through domain socket /run/containerd/containerd.sock

     docker run ubuntu echo hello
     Unable to find image 'ubuntu:latest' locally
     latest: Pulling from library/ubuntu
     7ddbc47eeb70: Pull complete 
     c1bbdc448b72: Pull complete 
     8c3b70e39044: Pull complete 
     45d437916d57: Pull complete 
     Digest: sha256:6e9f67fa63b0323e9a1e587fd71c561ba48a034504fb804fd26fd8800039835d
     Status: Downloaded newer image for ubuntu:latest
     hello
     

    … and process hangs in this moment. To stop it’s needed to send signal SIGKILL

     # ps -ef |grep "docker run ubuntu echo hello"
     root     794  2712  0 20:59 pts/2    00:00:00 docker run ubuntu echo hello
     # kill -9 794
     

    ……

Docker Troubleshooting

  • Restart Docker

      systemctl restart docker
    
  • “Hard”-Restart docker

      systemctl stop kublr-seeder kublr
      systemctl stop docker
    
  • If stop docker hangs:

      pkill -9 dockerd
      rm -rf /var/run/docker
      rm /var/run/docker.*
    
  • Start Kublr agent:

      systemctl start kublr
    
  • Docker logs:

    • For Ubuntu:

        tail -f /var/log/syslog |grep dockerd
      
    • For RHEL:

        tail -f /var/log/messages |grep dockerd
      
  • Docker debug logs: (it’s needed to add flag ‘–debug’ to dockerd):

      /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --config-file=/etc/docker/kublr-daemon.json --debug
    
  • Get docker stacktrace:

      sudo curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine\?debug\=2