kubernetesのバックアップとレストア

2024年2月17日2024年2月20日

kubernetesのコントロールプレーンノード
をバックアップ、レストアしてみる

https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/

environment

ubuntu22.04

control * 1
worker * 2

※cloudなどのフルマネージド環境ではないため、今回はcontrolノード上で直接ファイルをバックアップ、レストアしています。

version

root@k8s-worker02:/home/ocarina# dpkg -l | grep kube
hi  kubeadm                               1.28.2-00                               amd64        Kubernetes Cluster Bootstrapping Tool
hi  kubectl                               1.28.2-00                               amd64        Kubernetes Command Line Tool
hi  kubelet                               1.28.2-00                               amd64        Kubernetes Node Agent
ii  kubernetes-cni                        1.2.0-00                                amd64        Kubernetes CNI

install etcdctl

apt install etcd-client

backup

_BKDIR=~/backup/kubernetes
mkdir -p ${_BKDIR}/etc
mkdir -p ${_BKDIR}/var/lib/kubelet
_TARGET=`uname -n`  # nodename of control-plane

kubernetesのバックアップ

cloudのマネージドタイプの場合はログイン出来ないのでインスタンスのスナップショットなどでなんとかする？

cp -rip /var/lib/kubelet ${_BKDIR}/var/lib/.
cp -rip /etc/kubernetes  ${_BKDIR}/etc/.

証明書を取得

_ETCD_CACERT=$(kubectl get pods/etcd-$_TARGET -n kube-system -o yaml | grep '\-\-trusted-ca-file'|cut -d "=" -f2)
echo $_ETCD_CACERT

_ETCD_CERT=$(kubectl get pods/etcd-$_TARGET -n kube-system -o yaml | grep '\-\-cert-file'|cut -d "=" -f2)
echo $_ETCD_CERT

_ETCD_KEY_FILE=$(kubectl get pods/etcd-$_TARGET -n kube-system -o yaml | grep '\-\-key-file'|cut -d "=" -f2)
echo $_ETCD_KEY_FILE

snapshot etcd

time ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379   --cacert=$_ETCD_CACERT --cert=$_ETCD_CERT --key=$_ETCD_KEY_FILE  snapshot save ${_BKDIR}/etcd.db.`date +%Y%m%d`

2024-02-17 22:33:55.474824 I | clientv3: opened snapshot stream; downloading
2024-02-17 22:34:03.458115 I | clientv3: completed snapshot read; closing
Snapshot saved at /root/backup/kubernetes/etcd.db.20240217

real 0m25.509s
user 0m1.693s
sys 0m0.334s

root@k8s-cont01:~# ETCDCTL_API=3 etcdctl snapshot status --write-out=table $_BKDIR/etcd.db.20240217 
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| b986ade3 |   520916 |       1137 |     6.2 MB |
+----------+----------+------------+------------+
root@k8s-cont01:~#

restore

data-dir-locationが以前と同じフォルダである場合は、クラスタをリストアする前にそれを削除してetcdプロセスを停止します。そうでない場合は、etcdの設定を変更し、リストア後にetcdプロセスを再起動して新しいデータ・ディレクトリを使用するようにします。

復元後のクラスタのアクセスURLが以前のクラスタから変更されている場合は、
それに応じてKubernetes APIサーバを再設定する必要がある。
この場合、--etcd-servers=$OLD_ETCD_CLUSTERフラグの代わりに--etcd-servers=$NEW_ETCD_CLUSTERフラグを指定してKubernetes APIサーバーを再起動します。
NEW_ETCD_CLUSTERと$OLD_ETCD_CLUSTERをそれぞれのIPアドレスに置き換えます。
etcdクラスタの前でロードバランサを使用している場合は、代わりにロードバランサを更新する必要があるかもしれません。

今回はcontrolノードが1台のため、

kubernetes停止
kubernetes初期化
- /var/lib/etcd削除
- /var/lib/kubelet削除
- /etc/kubernetes削除
- iptables削除
起動できない&ほかノードからget nodes出来ない事を確認
restore
を行います。

kubernetes停止と初期化

systemctl stop kubelet

yes | kubeadm reset

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X && ipvsadm -C

[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0217 22:36:17.995058 3026 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: [preflight] Running pre-flight checks
[reset] Deleted contents of the etcd data directory: /var/lib/etcd >
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
-bash: ipvsadm: command not found

データが消えたことを確認

root@k8s-cont01:~# ls -la /var/lib/kubelet/
total 8
drwxr-xr-x  2 root root 4096 Feb 17 22:37 .
drwxr-xr-x 35 root root 4096 Feb 12 22:13 ..
root@k8s-cont01:~# ls -la /etc/kubernetes/
total 16
drwxr-xr-x  4 root root 4096 Feb 17 22:37 .
drwxr-xr-x 92 root root 4096 Feb 17 17:29 ..
drwxr-xr-x  2 root root 4096 Feb 17 22:37 manifests
drwxr-xr-x  2 root root 4096 Feb 17 22:37 pki
root@k8s-cont01:~# ls -la /etc/kubernetes/manifests/
total 8
drwxr-xr-x 2 root root 4096 Feb 17 22:37 .
drwxr-xr-x 4 root root 4096 Feb 17 22:37 ..
root@k8s-cont01:~# ls -la /etc/kubernetes/pki/
total 8
drwxr-xr-x 2 root root 4096 Feb 17 22:37 .
drwxr-xr-x 4 root root 4096 Feb 17 22:37 ..
root@k8s-cont01:~# ls -la /var/lib/etcd/
total 8
drwx------  2 root root 4096 Feb 17 22:36 .
drwxr-xr-x 35 root root 4096 Feb 12 22:13 ..
root@k8s-cont01:~# 
root@k8s-cont01:~# ps awxu|grep kube
root        4017  0.0  0.0   4024  2108 pts/0    S+   22:38   0:00 grep --color=auto kube

workerノードからの接続不可能を確認

root@k8s-worker02:/home/ocarina# kubectl get nodes
The connection to the server 172.31.1.1:6443 was refused - did you specify the right host or port?
root@k8s-worker02:/home/ocarina#

root@k8s-worker03:~# kubectl get pods
The connection to the server 172.31.1.1:6443 was refused - did you specify the right host or port?
root@k8s-worker03:~#

停止中のサービス状況確認

コントロールプレーンノードが停止中でも、
http://172.31.0.200:30288/

問題ありませんでした。

restore

root@k8s-cont01:~# rsync --delete -a $_BKDIR/etc/kubernetes/ /etc/kubernetes/
root@k8s-cont01:~# 
root@k8s-cont01:~# rsync --delete -a $_BKDIR/var/lib/kubelet/ /var/lib/kubelet/

root@k8s-cont01:~# rmdir /var/lib/etcd/
root@k8s-cont01:~# time ETCDCTL_API=3 etcdctl --data-dir /var/lib/etcd snapshot restore $_BKDIR/etcd.db.20240217 
2024-02-17 22:44:41.203434 I | mvcc: restore compact to 520212
2024-02-17 22:44:42.798887 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2024-02-17 22:44:44.056801 W | wal: sync duration of 1.256692792s, expected less than 1s

real    0m5.220s
user    0m0.069s
sys     0m0.082s

起動...!

root@k8s-cont01:~# systemctl start kubelet
root@k8s-cont01:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sat 2024-02-17 22:45:09 JST; 3s ago

root@k8s-cont01:~# kubectl get nodes -o wide
NAME           STATUS   ROLES           AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8s-cont01     Ready    control-plane   5d      v1.28.2   172.31.1.1    <none>        Ubuntu 22.04.3 LTS   5.15.0-94-generic   containerd://1.6.28
k8s-worker02   Ready    <none>          4d23h   v1.28.2   172.31.1.2    <none>        Ubuntu 22.04.3 LTS   5.15.0-94-generic   containerd://1.6.28
k8s-worker03   Ready    <none>          4d23h   v1.28.2   172.31.1.3    <none>        Ubuntu 22.04.3 LTS   5.15.0-94-generic   containerd://1.6.28
root@k8s-cont01:~#

root@k8s-worker02:/home/ocarina# kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-5545dbfdfb-h7z6v   1/1     Running   0          2d21h
root@k8s-worker02:/home/ocarina# kubectl get nodes
NAME           STATUS   ROLES           AGE     VERSION
k8s-cont01     Ready    control-plane   5d      v1.28.2
k8s-worker02   Ready    <none>          4d23h   v1.28.2
k8s-worker03   Ready    <none>          4d23h   v1.28.2
root@k8s-worker02:/home/ocarina#

無事STATUSがReadyになりました。/var/lib/kubeletと/etc/kubernetesのバックアップも大事でした。

Posted by ocarina