After updating docker packages to version 20 in SolusIO stack cannot start properly: endpoint create on GW Network failed

Applicable to:

  • SolusIO

Symptoms 

After updating Docker packages to version 20 on Ubuntu 20/CentOS 8 CR nodes, SolusIO stack cannot start properly. The errors like the following can be found in /var/log/syslog or /var/log/messages:

CONFIG_TEXT: Dec 10 13:48:35 server.solus.io dockerd[564826]: time="2020-12-10T13:48:35.350129897Z" level=error msg="fatal task error" error="container ingress-sbox: endpoint create on GW Network failed: failed to create endpoint gateway_ingress-sbox on network docker_gwbridge

Cause

The issue has been reported to Docker developers as #41775

Resolution

Here are workaround steps:

How to update docker packages on Ubuntu 18/20 and Debian 10 CR node
  1. Connect to affected SolusIO CR node via SSH
  2. Stop solus stack:

    # docker stack rm solus

  3. Remove swarm:

    # docker swarm leave --force

  4. Remove docker-ce* and containerd.io packages

    # apt purge docker-ce
    # apt purge docker-ce-cli
    # apt purge containerd.io

  5. Remove docker related rules from firewalld:

    # firewall-cmd --zone=trusted --remove-interface=docker_gwbridge
    # firewall-cmd --delete-zone=docker --permanent

  6. Create backup of network files:

    # mkdir docker_files_bkp
    # cp -av /var/lib/docker/network/files/* docker_files_bkp/

    and remove them

    # rm -rf /var/lib/docker/network/files/

  7. Remove docker0 and docker_gwbridge interfaces

    # ip link del docker0
    # ip link del docker_gwbridge

  8. Install Docker in accordance to their documentation

    # apt-get install docker-ce docker-ce-cli containerd.io

  9. Initialize stack

    # docker swarm init --advertise-addr 127.0.0.1 --listen-addr 127.0.0.1:2377

  10. Start solus stack

    # docker stack deploy --with-registry-auth -c /usr/local/solus/config/stack.yml solus

How to update docker packages on CentOS 8 CR node
  1. Connect to affected SolusIO CR node via SSH
  2. Stop solus stack:

    # docker stack rm solus

  3. Remove swarm:

    # docker swarm leave --force

  4. Remove docker-ce* and containerd.io packages

    # yum remove docker-ce
    # yum remove docker-ce-cli
    # yum remove containerd.io

  5. Remove docker related rules from firewalld:

    # firewall-cmd --zone=trusted --remove-interface=docker_gwbridge
    # firewall-cmd --delete-zone=docker --permanent

  6. Create backup of network files:

    # mkdir docker_files_bkp
    # cp -av /var/lib/docker/network/files/* docker_files_bkp/

    and remove them

    # rm -rf /var/lib/docker/network/files/

  7. Remove docker0 and docker_gwbridge interfaces

    # ip link del docker0
    # ip link del docker_gwbridge

  8. Install Docker in accordance to their documentation

    # yum install docker-ce docker-ce-cli containerd.io

  9. Initialize stack

    # docker swarm init --advertise-addr 127.0.0.1 --listen-addr 127.0.0.1:2377

  10. Start solus stack

    # docker stack deploy --with-registry-auth -c /usr/local/solus/config/stack.yml solus

  11. Make the runtime firewalld zones permanent:

    # firewall-cmd --runtime-to-permanent

Articles in this section

Was this article helpful?
0 out of 0 found this helpful
Share

Comments

0 comments

Please sign in to leave a comment.