美文网首页
kubernetes集群管理系列讲座(六)etcd备份与恢复

kubernetes集群管理系列讲座(六)etcd备份与恢复

作者: 炼狱腾蛇Eric | 来源:发表于2020-06-22 14:55 被阅读0次

课程目标

灾备原理

怎样备份

怎样恢复

1. 灾备原理

etcd设计用于承受机器故障。etcd集群自动从临时故障(例如,机器重新启动)中恢复,并且对N个成员的集群最多可容忍(N-1)/2个永久故障。当一个成员永久性地失败时,无论是由于硬件故障还是磁盘损坏,它都将失去对集群的访问。如果集群永久失去超过(N-1)/2个成员,那么它将无法使用,永久地失去仲裁能力。一旦仲裁丢失,群集就无法达成共识,因此无法继续接受更新。

为了从灾难性故障中恢复,etcd v3提供了快照和恢复功能,以便在不丢失etcd v3 key数据的情况下重新创建集群。

也就是说,etcd的备份实际上是备份了某个key在某个时间的状态,也就是备份的内容是整个的keyspace

2. 备份

刚才说过了,备份实际上就是对于keyspace的snapshot。而恢复集群首先需要的是从etcd集群成员中对于keyspace的snapshot。快照可以是使用etcdctl snapshot save命令导出的文件,或者是从etcd数据文件夹中member/snap/db复制出来的db文件。我们可以使用下面的命令对etcd数据库打快照。

$ ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db

当然,我们这里使用的是带ssl证书的,应该使用下面的命令

ETCDCTL_API=3 etcdctl --endpoints https://10.0.13.126:2379 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" snapshot save /tmp/snapshot.db
{"level":"info","ts":1592547629.6369183,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/tmp/snapshot.db.part"}
{"level":"info","ts":"2020-06-19T06:20:29.643Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1592547629.6432784,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.13.126:2379"}
{"level":"info","ts":"2020-06-19T06:20:29.644Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1592547629.6491425,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.13.126:2379","size":"20 kB","took":0.012103202}
{"level":"info","ts":1592547629.6492367,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/tmp/snapshot.db"}
Snapshot saved at /tmp/snapshot.db

这样我们就得到了snapshot文件,/tmp/snapshot.db

3. 恢复

要恢复集群,只需要一个快照“db”文件。使用etcdctl snapshot restore的群集还原将创建新的etcd数据目录;所有成员都应使用同一快照还原。还原将覆盖某些快照元数据(特别是成员ID和群集ID);该成员将丢失其以前的标识。此元数据覆盖可防止新成员无意中加入现有群集。因此,要从快照启动群集,还原必须启动新的逻辑群集。

注意

  • 快照完整性可以在还原时进行验证。如果使用etcdctl snapshot save拍摄快照,则它将具有由etcdctl snapshot restore检查的完整性哈希。
  • 如果快照是从数据目录复制的,则不存在完整性哈希,它只能使用--skip hash check进行还原。

步骤如下

  • 我们首先向etcd中放点数据
$ etcdctl put name jormun --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem"

$ etcdctl get name --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem"
name
jormun
  • 然后备份一下
$ ETCDCTL_API=3 etcdctl --endpoints https://10.0.13.126:2379 --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem" snapshot save /tmp/snap.db
{"level":"info","ts":1592548731.7236383,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/tmp/snap.db.part"}
{"level":"info","ts":"2020-06-19T06:38:51.727Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1592548731.7275999,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.13.126:2379"}
{"level":"info","ts":"2020-06-19T06:38:51.730Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1592548731.7342563,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.13.126:2379","size":"20 kB","took":0.010554193}
{"level":"info","ts":1592548731.7343948,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/tmp/snap.db"}
Snapshot saved at /tmp/snap.db
  • 然后给name新的值
$ etcdctl put name jormun_new --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem"
OK

$ etcdctl get name --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem"
name
jormun_new
  • 现在我们停掉三台etcd
$ systemctl stop etcd
  • 然后在三个节点上分别恢复

  • 注意:还原将使用etcd的群集配置标志使用新的群集配置初始化新群集的新成员,但保留etcd keyspace的内容。从上一个示例继续,下面为三个集群成员创建新的etcd数据目录(infra0.etcd、infra1.etcd、infra2.etcd),且就在执行命令的目录,也就是当前目录pwd下创建文件夹。我是在/data目录下执行的

$ ETCDCTL_API=3 etcdctl snapshot restore /tmp/snap.db \
   --name infra0 \
   --initial-cluster infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380 \
   --initial-cluster-token ea8cfe2bfe85b7e6c66fe190f9225838 \
   --initial-advertise-peer-urls https://10.0.11.36:2380 \
   --cacert="/etc/kubernetes/pki/etcd/ca.pem" \
   --cert="/etc/kubernetes/pki/etcd/members.pem" \
   --key="/etc/kubernetes/pki/etcd/members-key.pem"
   
{"level":"info","ts":1592805749.46045,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/tmp/snap.db","wal-dir":"infra0.etcd/member/wal","data-dir":"infra0.etcd","snap-dir":"infra0.etcd/member/snap"}
{"level":"info","ts":1592805749.467868,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"f36e4227f36fdc03","local-member-id":"0","added-peer-id":"466035bfe5ac7a64","added-peer-peer-urls":["https://10.0.13.126:2380"]}
{"level":"info","ts":1592805749.4684293,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"f36e4227f36fdc03","local-member-id":"0","added-peer-id":"784e0050552d81cd","added-peer-peer-urls":["https://10.0.12.21:2380"]}
{"level":"info","ts":1592805749.4684544,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"f36e4227f36fdc03","local-member-id":"0","added-peer-id":"f7d6895384ae86d2","added-peer-peer-urls":["https://10.0.11.36:2380"]}
{"level":"info","ts":1592805749.4759514,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/tmp/snap.db","wal-dir":"infra0.etcd/member/wal","data-dir":"infra0.etcd","snap-dir":"infra0.etcd/member/snap"}
  • 这个时候会在/data下面生成一个新的文件,叫infra0.etcd
$ ls /data/
etcd  infra0.etcd
  • 修改启动文件/etc/etcd/etcd.conf,修改数据文件位置
# DATA_DIR=/data/etcd
DATA_DIR=/data/infra0.etcd
  • 还原将使用etcd的群集配置标志使用新的群集配置初始化新群集的新成员,但保留etcd keyspace的内容。从上一个示例继续,需要为每个集群成员创建新的etcd数据目录(/data/infra0.etcd,/data/infra1.etcd,/data/infra2.etcd)。我们把刚才备份的数据文件放到每一台机器的/tmp下,叫snap.db。
  • on infra1
ETCDCTL_API=3 etcdctl snapshot restore /tmp/snap.db \
   --name infra1 \
   --initial-cluster infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380 \
   --initial-cluster-token ea8cfe2bfe85b7e6c66fe190f9225838 \
   --initial-advertise-peer-urls https://10.0.12.21:2380 \
   --cacert="/etc/kubernetes/pki/etcd/ca.pem" \
   --cert="/etc/kubernetes/pki/etcd/members.pem" \
   --key="/etc/kubernetes/pki/etcd/members-key.pem"
  • on infra2
ETCDCTL_API=3 etcdctl snapshot restore /tmp/snap.db \
   --name infra2 \
   --initial-cluster infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380 \
   --initial-cluster-token ea8cfe2bfe85b7e6c66fe190f9225838 \
   --initial-advertise-peer-urls https://10.0.13.126:2380 \
   --cacert="/etc/kubernetes/pki/etcd/ca.pem" \
   --cert="/etc/kubernetes/pki/etcd/members.pem" \
   --key="/etc/kubernetes/pki/etcd/members-key.pem"
  • 修改infra1和infra2的配置文件/etc/etcd/etcd.conf
DATA_DIR=/data/infra1.etcd
HOST_NAME=infra1
HOST_IP=10.0.12.21
CLUSTER=infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380
CLUSTER_STATE=new
TOKEN=ea8cfe2bfe85b7e6c66fe190f9225838
DATA_DIR=/data/infra2.etcd
HOST_NAME=infra2
HOST_IP=10.0.13.126
CLUSTER=infra0=https://10.0.11.36:2380,infra1=https://10.0.12.21:2380,infra2=https://10.0.13.126:2380
CLUSTER_STATE=new
TOKEN=ea8cfe2bfe85b7e6c66fe190f9225838
  • 修改权限(我这里是按照前面的步骤做的,有可能权限不一样,请根据情况而定)
chown -R etcd:adm infra0.etcd
chown -R etcd:adm infra1.etcd
chown -R etcd:adm infra2.etcd
  • 每台机器上都启动etcd
systemctl start etcd
  • 此时在查数据库,数据已经回来了
etcdctl get name --cacert="/etc/kubernetes/pki/etcd/ca.pem" --cert="/etc/kubernetes/pki/etcd/members.pem" --key="/etc/kubernetes/pki/etcd/members-key.pem"
name
jormun

为了方便大家学习,请大家加我的微信,我会把大家加到微信群(微信群的二维码会经常变)和qq群821119334,问题答案云原生技术课堂,有问题可以一起讨论

  • 个人微信
    640.jpeg

  • 腾讯课堂
    640-20200506145837072.jpeg

  • 微信公众号
    640-20200506145842007.jpeg

  • 专题讲座

2020 CKA考试视频 真题讲解 https://www.bilibili.com/video/BV167411K7hp

2020 CKA考试指南 https://www.bilibili.com/video/BV1sa4y1479B/

2020年 5月CKA考试真题 https://mp.weixin.qq.com/s/W9V4cpYeBhodol6AYtbxIA

相关文章

网友评论

      本文标题:kubernetes集群管理系列讲座(六)etcd备份与恢复

      本文链接:https://www.haomeiwen.com/subject/dnixfktx.html