Applicable to:
- SolusIO
Symptoms
After updating Docker packages to version 20 on Ubuntu 20/CentOS 8 CR nodes, SolusIO stack cannot start properly. The errors like the following can be found in /var/log/syslog
or /var/log/messages
:
CONFIG_TEXT: Dec 10 13:48:35 server.solus.io dockerd[564826]: time="2020-12-10T13:48:35.350129897Z" level=error msg="fatal task error" error="container ingress-sbox: endpoint create on GW Network failed: failed to create endpoint gateway_ingress-sbox on network docker_gwbridge
Cause
The issue has been reported to Docker developers as #41775
Resolution
Here are workaround steps:
- Connect to affected SolusIO CR node via SSH
- Stop solus stack:
# docker stack rm solus
- Remove swarm:
# docker swarm leave --force
- Remove
docker-ce*
andcontainerd.io
packages# apt purge docker-ce
# apt purge docker-ce-cli
# apt purge containerd.io - Remove docker related rules from firewalld:
# firewall-cmd --zone=trusted --remove-interface=docker_gwbridge
# firewall-cmd --delete-zone=docker --permanent - Create backup of network files:
# mkdir docker_files_bkp
# cp -av /var/lib/docker/network/files/* docker_files_bkp/and remove them
# rm -rf /var/lib/docker/network/files/
- Remove
docker0
anddocker_gwbridge
interfaces
# ip link del docker0
# ip link del docker_gwbridge - Install Docker in accordance to their documentation
# apt-get install docker-ce docker-ce-cli containerd.io
- Initialize stack
# docker swarm init --advertise-addr 127.0.0.1 --listen-addr 127.0.0.1:2377
- Start solus stack
# docker stack deploy --with-registry-auth -c /usr/local/solus/config/stack.yml solus
- Connect to affected SolusIO CR node via SSH
- Stop solus stack:
# docker stack rm solus
- Remove swarm:
# docker swarm leave --force
- Remove
docker-ce*
andcontainerd.io
packages# yum remove docker-ce
# yum remove docker-ce-cli
# yum remove containerd.io - Remove docker related rules from firewalld:
# firewall-cmd --zone=trusted --remove-interface=docker_gwbridge
# firewall-cmd --delete-zone=docker --permanent - Create backup of network files:
# mkdir docker_files_bkp
# cp -av /var/lib/docker/network/files/* docker_files_bkp/and remove them
# rm -rf /var/lib/docker/network/files/
- Remove
docker0
anddocker_gwbridge
interfaces
# ip link del docker0
# ip link del docker_gwbridge - Install Docker in accordance to their documentation
# yum install docker-ce docker-ce-cli containerd.io
- Initialize stack
# docker swarm init --advertise-addr 127.0.0.1 --listen-addr 127.0.0.1:2377
- Start solus stack
# docker stack deploy --with-registry-auth -c /usr/local/solus/config/stack.yml solus
- Make the runtime firewalld zones permanent:
# firewall-cmd --runtime-to-permanent
Comments
3 comments
Thank you so much for helping me to the finale regarding this problem I have experienced and for sharing this useful article. You are great.
It works, but when you restart your server then the issue is back.
The issue with restart has been considered as a bug SIO-2857 and fixed in SolusIO version 1.1.16561
If you experience this issue, update SolusIO to the version 1.1.16561 or above
Please sign in to leave a comment.