ETCD Part 1: backup & restore(En)

Albert Weng
4 min readSep 27, 2023

--

ETCD is a crucial component in a Kubernetes cluster. It’s essential to understand how to back up and restore ETCD to ensure there’s a chance to recover in case of a cluster crash (so you don’t get in trouble with your boss).

store all of data

In this article, I will be divided into three major sections:

  1. What is ETCD?
  2. ETCD backup
  3. ETCD restore

Let’s get started!

1. What is ETCD?

Kubernetes uses ETCD (key-value storage) to store all data, including configuration data, state, and metadata. ETCD allows all Kubernetes nodes to perform read and write operations.

In simple terms, ETCD is responsible for storing both the “current” state and the “desired” state of the system. This includes any changes to ETCD’s content when you execute commands like “kubectl get XXX” or create objects with “kubectl create XXX.”

ETCD nodes communicate using the RAFT algorithm, and a cluster requires a minimum of 3 nodes (an odd number).

You can visit the following website to get a clearer explanation of how the Leader is elected (Leader Election), how data is replicated to other nodes while maintaining consistency (Log Replication), and what problems the RAFT algorithm primarily aims to solve:

Log Replication (From: http://thesecretlivesofdata.com/raft/)

2. ETCD backup

S2–1. Obtain etcdctl utility

[master]# ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4)
[master]# echo $ETCD_RELEASE
v3.5.9

[master]# wget https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz
[master]# tar zxvf etcd-v3.5.9-linux-amd64.tar.gz
[master]# cd etcd-v3.5.9-linux-amd64
[master]# ls -al
[master]# etcdctl version

S2–2. Acquiring Essential Information. In this step, you will obtain the following information through three available methods (choose any):

  • etcd endpoint : --endpoint
  • ca certificate : --cacert
  • server certificate : --cert
  • server key : --key
[Method 1]
[master]# vim /etc/kubernetes/manifests/etcd.yaml
[Method 2]
[master]# kubectl get po -n kube-system
[master]# kubectl describe pod etcd-master-node -n kube-system
[Method 3]
[master]# cat /etc/kubernetes/manifests/etcd.yaml |grep listen
[master]# cat /etc/kubernetes/manifests/etcd.yaml |grep file

S2–3. Performing Backup

[master]# ETCDCTL_API=3 etcdctl \
--endpoints=https://10.107.88.12:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /root/etcd/etcd.db

# Verify

[master]# ETCDCTL_API=3 etcdctl --write-out=table snapshot status /root/etcd/etcd.db

3. ETCD restore

The scenario for verification and restoration involves conducting a test, as follows:

  • Before : The default namespace should be empty.
  • Perform a backup
  • Create an nginx pod in the default namespace
  • Create a new directory and restore the data to the new location
  • Modify the manifest to make ETCD use the new location
  • After : The default namespace has returned to a state with no objects.
# The default namespace should be empty

[master]# kubectl get default
No resources found in default namespace.
# Perform a backup

[master]# ETCDCTL_API=3 etcdctl \
--endpoints=https://10.107.88.12:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /root/etcd/etcd-01.db
# Create an nginx pod in the default namespace

[master]# kubectl run testpod --image=nginx -n default
# Create a new directory and restore the data to the new location

[master]# mkdir /root/etcd-backup
[master]# ETCDCTL_API=3 etcdctl --data-dir="/root/etcd-backup" \
--endpoints=https://10.107.88.12:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot restore /root/etcd/etcd-01.db
# At this point, the state has not been restored yet

[master]# kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
testpod 1/1 Running 0 6m2s
# Edit /etc/kubernetes/manifests/etcd.yaml to point to the new directory

[master]# tree /root/etcd-backup
[master]# vim /etc/kubernetes/manifests/etcd.yaml
# After saving the file, wait for a few minutes to allow ETCD to update its state (during this time, the API may not respond)

[master]# kubectl get pod -n default

The basic ETCD restoration process is now complete. In the upcoming articles, we will conduct tests for various other scenarios.

In addition to backing up ETCD, it’s advisable to utilize third-party software like Velero or similar tools to provide additional protection for your applications. It can indeed enhance the stability and robustness of your Kubernetes cluster environment.

Reference:

--

--

Albert Weng

You don't have to be great to start, but you have to start to be great